Metadata Enrichment for Repositories in a London Instiutional Network
This project (MERLIN) will use off-the-shelf text mining techniques to enrich the functionality of the SHERPA-LEAP consortial repository cross-searching service, LASSO. LASSO offers search across aggregated, normalised metadata which is collected from London-based institutional repositories using OAI-PMH harvesting. MERLIN will use the TerMine term extraction tool to derive terms from the full text digital objects held at LASSO's source repositories and, after a weighting process, enrich the LASSO database with derived keywords. The derived terms will be exposed at various points in the LASSO interface to support discovery. In a supplementary strand, MERLIN will apply the tools developed by the HILT project to construct a pilot hierarchical, browsable subject tree from the text-mined keywords. The remodelled interface will undergo usability testing, and an end-user evaluation process will be carried out to inform the development work of the project.
A summative evaluation report on all the outputs of the project will be prepared, and an open source, re-usable web application will be created to allow the MERLIN metadata enrichment technology to be incorporated in any repository on any platform.