FAIR Synthesis: Metadata

This webpage has been archived. Its content will not be updated. View web retention policy

Metadata issues are fundamental to the design and functioning of an e-print repository and the provision of access to information in general.  This section explores some of the issues involved and highlights achievements of FAIR projects in this area.

The SHERPA project has also produced a general resource for e-print repositories, ‘An introduction to metadata requirements for an e-print repository' which gives an overview of the issues involved.


Quality and QA of metadata

Metadata quality is a key issue in developing repositories.  The quality of metadata obviously has implications for resource discovery and user satisfaction.  It also has implications for institutional ‘buy in’, as an institutional repository is a very visible service.  Several FAIR projects are investigating issues related to metadata quality, and the articles below reflect their findings:


Descriptive metadata

Descriptive metadata is used for indexing, discovery, and identification of items in repositories. OMI-PMH specifies unqualified Dublin Core as a basic requirement. Dublin Core is simple and flexible, but it wasn’t developed with repositories in mind. It needs to be used in a consistent way to enable searching and browsing across repositories. FAIR projects have developed guidelines to enable Dublin Core to be used consistently for different types of repositories.

ePrints UK developed a useful guide to using Dublin Core for e-prints, so that the metadata they harvested would be consistent.  During the project they explored various issues associated with using Dublin Core in a consistent way, e.g. how to encode full text links so they point to the correct document.

Similarly, in the area of electronic theses and dissertations (EDTs), FAIR projects have worked together to develop guidelines for use of Dublin Core.  Electronic Theses, Theses Alive!, and DAEDALUS have collaborated to develop a UK Metadata Core Set for ETDs.

Dublin Core was developed for bibliographic and other print materials and doesn’t adequately describe images and museum objects.  Accessing the Virtual Museum, BioMed Image Archive, Harvesting the Fitzwilliam, and Hybrid Archives explored this and the implications for discovery in a metadata issues paper.


Subject categorisation and vocabularies

Using subject categories or controlled vocabularies in conjunction with metadata has the potential to improve resource discovery. TARDis has been a leader in moving forward issues related to subject categorisation. At the 2nd Workshop on the Open Archives Initiative (OAI): Gaining Independence with ePrints Archives and OAI, 17-19 October 2003, at CERN in Geneva, a forum for discussion of 'subject' issues was thought to be an important next step.  This is being introduced by Southampton on the oai-eprints mailing list which was created as a result of the workshop. A series of discussion points on the subject categorisation of e-print archives. 

Accessing the Virtual Museum has developed a specialist Egyptology thesaurus to support and describe the records of museum objects created.  This comprises four separate vocabularies covering object names, place names, dates (and mechanisms for describing these), and material types.  Details of these vocabularies are available from project staff.  They can also be seen in action through the Petrie museum search page and selecting ‘Search the online catalogue’ from ‘The Petrie Museum’ menu.  The vocabularies can be browsed to select a search term. 

The PORTAL project investigated as part of its work how external resources should be surfaced within an institutional portal. A key element in presenting these resources is how they are described according to their subject area.  A discussion paper has been made available to raise some of the issues involved and encourage ongoing discussion of these.


Preservation metadata

Hybrid Archives has developed a new model for the preservation of datasets. The model allows for data to be deposited at the AHDS through harvesting via OAI-PMH, but for the content to also be held by the data owner who then provides access to it. This differs from current practice where data preservation traditionally involves handing over the entire dataset to the AHDS or similar body, who then preserves it and also provides access to it.  For further information about the hybrid model, see the section on repository models. The model will be supported by reports on preservation requirements and preservation metadata, to be posted in the project web site in summer 2005.


Rights metadata

A key objective of the RoMEO project was to develop a solution for protecting the IPR of e-prints in an OAI environment.  They first surveyed academic authors and data and service providers about the rights they wished to protect.  The rights solution involved developing simple rights metadata by which authors could describe the rights status of their e-prints, and a means by which OAI data providers and service provides might assert the rights status of their metadata under OAI-PMH.  This can be done using Creative Commons licenses.  The work of RoMEO influenced and fed into the formation of the OAI-rights Technical Working Group in the US, which seeks to extend the findings from RoMEO as a generic solution when using OAI (as opposed to just for e-prints).  Draft implementation guidelines have been produced and are available for view and comment at .


User Metadata

The PORTAL project undertook extensive studies to specify the requirements for institutional portals across UK institutions.  This included a report detailing the available metadata standards for the description of users within an institutional portal environment.

Bookmark and Share