Increasing Interoperability between Corpus Tools
Summary
This project aims to introduce corpus linguistics methods to research communities who are engaged in the study of language from different perspectives, and who have previously drawn on only a limited range of corpus software, or none at all. It will explore ways of linking different corpus query tools so that users can investigate aspects of the same data in a variety of ways. The number of tools that it will be possible to interlink will depend on software configurations and the willingness of other software developers to incorporate interoperability features. We will, however, be able to offer a prototype tool to link the WordTree, Intellitext, CQPweb and Wmatrix, four core tools developed at Coventry, Leeds and Lancaster.
Stakeholders who attended meetings in connection with the WordTree project commented on the difficulty of persuading novices to use corpus tools, because they often needed to switch from one application to another in order to fully satisfy a corpus query. At best this is time consuming, at worst it prevents researchers from reaching a satisfactory outcome, because they do not have access to all the tools they need, or do not know how to present their data in the form required by an unfamiliar tool. Combining the four tools will help to embed them in the research process for existing users, and will also attract new user groups (for example stylisticians) who have so far resisted the adoption of corpus linguistic methodologies.
Objectives
The project will explore the potential for interoperability between different corpus query tools, report on possible approaches to this, demonstrate a prototype linking the WordTree, Intellitext, CQPweb and Wmatrix, and disseminate findings widely across the various research communities concerned with texts and language use, so that researchers are encouraged to try research methods which are established in other disciplinary communities, but are not yet familiar in their own.
Anticipated Outputs and Outcomes
- A technical report providing the results of the survey
-
A prototype linking in the four core systems of the project (CQPweb, Intellitext, WordTree, Wmatrix)
-
Documentation for the prototype, made publicly available from the project website and other public code repositories (e.g. Google code, Guthub).
-
A project blog that documents the progress of the project with at least monthly postings.
-
A technical report detailing design, testing and implementation of the integrated tool
-
A final budget and completion report.
-
A survey article submitted for publication in the Language Resources and Evaluation Journal