Infrastructure for Resource Discovery
In 2009 JISC and RLUK convened a group of Higher Education library, museum and archive experts to think about what national services were required for supporting online discovery and reuse of collection metadata. This group was called the resource discovery taskforce (RDTF) and it met four times throughout 2009. The taskforce produced a vision and an implementation plan focused on making metadata about collections openly available therefore supporting the development of flexible and innovative services for end users. You can read summaries of the RDTF meetings and the vision and implementation plan on the RDTF blog. This blog will also be used to post news about progress.
The RDTF vision sets targets to meet by the end of 2012. JISC will be funding a range of projects, communication and support activities designed to meet those targets. This work will fall into four rough categories: institutional level, aggregation level, service level and support. JISC has begun to fund a range of work in these categories.
This programme of projects has been funded to begin to address the challenges that need to be overcome at the institutional level to realise the RDTF vision. The projects are focused on making metadata about library, museum and archive collections openly available using standards and licensing that allows that data to be reused.
These projects are being funded to allow exploration of making library archive and museum metadata openly available on the web. This exploration will support the discovery of approaches that will work for other Higher Education institutions. It will also permit the exploration of the major issues involved in making metadata available under an open licence, issues such as licensing, schema, provenance, authority and technical issues. Projects will be required to discuss these issues on their project blogs and JISC and relevant RDTF projects will collect these lessons and process them into appropriate formats so that the sector as a whole benefits from the projects.
What are the projects doing?
Comet - Cambridge University - The COMET project will release a large sub-set of bibliographic data from Cambridge University Library catalogues as open structured metadata, testing a number of technologies and methodologies including XML, RDF, SPARQL and JSON. It will investigate and document the availability of metadata for the library’s collections
which can be released openly in machine-readable formats and the barriers which prevent other data from being exposed in this way.
Estimated amount of data to be made available: 2,200,000 metadata records
Connecting repositories - Open University - The CORE project aims to make it easier to navigate between relevant scientific papers stored in Open Access repositories. The project will use Linked Data format to describe the relationships between papers stored across a selection of UK repositories, including the Open University Open Research Online (ORO). A resource discovery web-service and a demonstrator client will be provided to allow UK repositories to embed this new tool into their own repository.
Estimated amount of data to be made available: Content of 20 repositories, 50,000 papers, 1,000,000 rdf triples
Contextual Wrappers - Cambridge University - The project is concerned with the effectiveness of resource discovery based on metadata relating to the Designated collections at the Fitzwilliam Museum in the University of Cambridge and made available through the Culture Grid, an aggregation service for museums, libraries and archives metadata. The project will investigate whether Culture Grid interface and API can be enhanced to allow researchers to explore hierarchical relationships between collections and the browsing of object records within a collection
Estimated amount of data to be made available: 164,000 object records (including 1,000 new/enhanced records), 74,800 of them with thumbnail images for improved resource discovery
Discovering Babel - Oxford University - The digital literary and linguistic resources in the Oxford Text Archive and in the British National Corpus have been available to researchers throughout the world for several decades. This project will focus on technical enhancements to the resource discovery infrastructure that will allow wider dissemination of open metadata, will facilitate interaction with research infrastructures and the knowledge and expertise achieved will be shared with the community.
Estimated amount of data to be made available: 2,000 literary and linguistic resources in electronic form
Jerome - University of Lincoln - Jerome began in the summer of 2010, as an informal 'un-project', with the aim of radically integrating data available to the University of Lincoln's library services and offering a uniquely personalised service to staff and students through the use of new APIs, open data and machine learning. This project will develop a sustainable, institutional service for open bibliographic metadata, complemented with well documented APIs and an 'intelligent', personalised interface for library users.
Estimated amount of data to be made available: ~250,000 bibliographic record library catalogue, along with constantly expanding data about our available journals and their contents augmented by the Journal TOCs API, and c.3,000 additional records from our EPrints repository
Open Metadata Pathfinder - King's College London - The Open Metadata Pathfinder project will deliver a demonstrator of the effectiveness of opening up archival catalogues to widened automated linking and discovery through embedding RDFa metadata in Archives in the M25 area (AIM25) collection level catalogue descriptions. It will also implement as part of the AIM25 system the automated publishing of the system's high quality authority metadata as open datasets. The project will include an assessment of the effectiveness of automated semantic data extraction through natural language processing tools (using GATE) and measure the effectiveness of the approach through statistical analysis and review by key stakeholders (users and archivists).
Salda - Sussex University - The project will extract the metadata records for the Mass Observation Archive from the University of Sussex Special Collection’s Archival Management System (CALM) and convert them in to Linked Data that will be made publicly available.
Estimated amount of data to be made available: This project will concentrate on the largest archival collection held within the Library, the Mass Observation Archive, potentially creating up to 23,000 Linked Data records.
OpenArt - York University - OpenART, a partnership between the University of York, the Tate and technical partners,
Acuity Unlimited, will design and expose linked open data for an important research dataset entitled "The London Art World 1660-1735". Drawing on metadata about artists, places and sales from a defined period of art history scholarship, the dataset offers a complete picture of the London art world during the late 17th and early 18th centuries. Furthermore, links drawn to the Tate collection and the incorporation of collection metadata will allow exploration of works in their contemporary locations. The process will be designed to be scalable to much richer and more varied datasets, both at York, Tate and beyond.
Estimated amount of data to be made available: At the heart of OpenART is "The London Art World 1660-1735", a major research and metadata-creation activity designed to provide a detailed, archive-based resource, enabling users to explore this art world‟s lost networks, markets and geographies. It comprises four interconnected categories of material and information:
- People: a biographical dictionary describing some 4,000 painters, auctioneers, dealers, patrons, collectors, print publishers and artists' suppliers;
- Sales: a calendar of over 2,000 public art sales, charting the resale market from its beginnings in the Restoration period to the 1730s;
- Place: a map of the art world that plots artists' premises, print shops, auction venues, collectors' houses and other places onto contemporary maps;
- Sources: over 12,000 sources comprising a bibliography of primary and secondary literature and full-text primary sources. These will include transcripts of several thousand newspaper advertisements, George Vertue's notebooks.
How does this relate to other programmes?
The RDTF is a large programme with lots of different facets and relationships. An overview of the RDTF work and how it is all related can be found on the RDTF blog. Here is a list of some of the most relevant reports, projects and information for these projects.
JISC funded the development of the Open Bibliographic Data Guide for librarians who are interested and want to explore the implications of open bibliographic data. This will be added to during the RDTF work.
Edina produced a study investigating the issues with aggregation of metadata about images and time based media
There are 3 projects funded under the JISCexpo call that will produce metadata and knowledge that will be relevant to the RDTF vision:
Projects will be funded at Edina and Mimas to enhance existing aggregations like Suncat and Copac in line with the RDTF vision.
Libraries, museums and archives throughout Europe are engaging in work on open bibliographic or open metadata projects that are relevant to JISC work on the RDTF vision. Projects and developments are too numerous to mention in this briefing paper but here are some particularly useful links:
Where can I go for further information?
The RDTF blog is the main source for news of developments in the pursuit of the RDTF vision.
The RDTF vision is available from the IE repository. The implementation plan is available from the RDTF blog.
There will be a new central site to pull together all the effort in implementing the RDTF vision in the next few months.
The project websites are being added at the moment. These will be linked from the bottom of this page. Every project will have a blog and these will be used to provide updates and to communicate learning.