Adaptable and learnable user interfaces for research tools: The Word Tree Corpus Interface
A corpus is ‘a collection of pieces of language text in electronic form, selected according to external criteria to represent, as far as possible, a language or language variety as a source of data for linguistic research’ (Sinclair 2005). In recent years a number of relatively small, highly targeted corpora have been created to support the investigation of specified types of speech and writing, for the benefit of researchers, teachers and language students learning about the language requirements of specific domains. Generally the small corpora that are most used and cited are those which enable the easiest and fastest extraction of relevant data; potentially useful corpora are often underused because they lack an online interface, or offer an interface which is not completely fit for purpose because it was originally designed for lexicographers and information scientists working with much larger general corpora.
The most important information contained in a corpus concerns patterns of language use, but these patterns are often hard to discern when corpus data is presented in the standard way, using Key Words in Context (KWIC) concordance lines. Our project will build a new kind of ‘Word Tree’ interface which will present these patterns visually, enabling users to interact with the surface layer of data, but also to enter increasingly complex digital environments where they can examine language patterns in wider contexts and gather statistical evidence to support research hunches.
During the project we will engage with different kinds of potential users, to find out what they want from the interface, and to ensure that any usability problems are identified and corrected. We will have face-to-face meetings with groups of corpus linguists, novice researchers, and language teachers and learners. Additionally, we will make each new version of the interface available online, so that a much larger group of stakeholders can try it out while we conduct remote usability testing. We will reach stakeholders in the UK and also in a variety of HE institutions overseas. We will keep a blog diary of our progress at http://cuba.coventry.ac.uk/wordtree/
The aim of this project is to develop a multi-dimensional Word Tree interface which will allow users to search and browse within documents and across a corpus, and access instant visual representation of the language patterns surrounding any given word or phrase. Our goal is to increase access and usage of corpus resources, both by corpus linguists and by language teachers and learners. In order to reach our goal the interface will have to be accessible and fun for non-experts, whilst providing useful pattern information for all levels of stakeholder.
Anticipated Outputs and Outcomes
- Visualisation applets, with reference implementations for the BAWE corpus.
- An open REST API to the corpus to significantly improve the usability of the resource, and to promote further visualisation development.
- Clear, easily modifiable pattern library documentation for the Word Tree, explaining suitable use-cases and potential customisations for other corpora.·
- A publicly accessible, open source-code repository with feature discussion area.
- A project blog that documents the progress of the project.
- A technical report detailing design, testing and implementation, and describing the lessons learned in terms of learnability and usability design.
- A completion report.
Documents & Multimedia
Portable Document Format (pdf) File [ 177 Kb ]