Evaluating Automated Subject Tools Enhancing Retrieval
The purpose of the project is to test and evaluate existing tools for automated subject metadata generation in order to better understand what is possible, what the limitations of current solutions are, and make subsequent recommendations for services employing subject metadata in the JISC community.
The information centre to be chosen as a test-bed for this project will be Intute, a free online service providing access to quality-controlled, manually selected and catalogued Web resources for learning and research. We envision the project outputs would help further understand the value of subject metadata tools and their evaluation and identify opportunities that should be further exploited as part of the e-infrastructure for education and research.
The project is concerned with the creation and enrichment of subject metadata using existing automated tools. Subject metadata are most important in resource discovery, yet most expensive to produce manually. In addition, they are much more difficult to generate automatically especially in comparison to formal metadata such as file type, title, etc. Also, due to the high cost of evaluation, automated subject metadata tools are rarely tested in live environments of use. There is a huge challenge facing UK HE digital collections, institutional repositories, and aggregators of institutional repository content, as to how to provide high quality subject metadata for increasing numbers of digital information at reasonable costs.
The project will examine existing tools to determine to what degree they can be integrated into (semi-)automated workflows. The tools for automated subject metadata generation will be tested in two contexts: by Intute cataloguers in the cataloguing workflow; and by end-users of Intute who search for information in Intute as part of their research, learning, and information management .
The project will first develop the methodology for evaluating tools for automated subject metadata. The methodology will then be implemented in the above contexts. First, all tools will be evaluated for results using a created ‘gold standard’. The best tool(s) for the purposes of Intute will be implemented into a demonstrator that will feed its results into the cataloguing workflow. This will be evaluated. Furthermore, a task-based end-user retrieval study will be conducted to determine the contribution of automatically assigned terms and manually assigned terms, each alone and in combination, to retrieval success (retrieving relevant documents) and failure (missing relevant documents and retrieving irrelevant documents).
Lead Institution
- University of Bath, UKOLN
Partner Institutions
- University of Glamorgan
- Intute
- MIMAS, University of Manchester
- City University London; University of Maryland
- Royal School Library and Information Science, Denmark
- OCLC