This joint project from UKOLN, NaCTeM and Knowledge Integration brings together the experience of each partner in text analysis and information extraction techniques in order to complete a practical evaluation of formal metadata generation methods within real world workflows. These include the well-known problem of metadata deposit, and workflows from later in the metadata lifecycle; triage - incremental improvement of metadata through error identification and correction - and normalisation, the increase of consistency for a specific purpose, such as republishing of the record as part of an overlay journal. The suitability of extracted formal metadata for purposes such as creation of metadata records, input into existing services for external subject classification or geographical localisation, and for reviewing resource accessibility and preservation are evaluated.
The proposal will make use of expert knowledge from each partner to evaluate existing tools, services and prototypes in a number of real-world contexts, including UKOLN's managed harvesting and aggregation tool, the University of Bath OPUS repository, and the University of Minho's REPOSITORIUM, the latter enabling practical evaluation of tool performance on languages other than English. It will also examine their use for metadata consumption in structures built on metadata; in particular, the overlay journal and OAI-ORE creation workflows. Acknowledging that the use of automated tools can have undesirable results in some scenarios, the project will additionally seek input on the ethical and legal impact of these practices. Recognising that sustainability and practical impact of this work all depend on effective review and dissemination, the EPrints project, a member of the DSpace development team and an experienced Fedora developer have agreed to take part in a consultation process evaluating the results for practical application within mature institutional repository platforms.
- Knowledge Integration