Online catalogue and repository interoperability study
Download the full report
Visit the project page to read the appendices
The aims of this study were to investigate and report on the: extent to which academic repository content is already held in library OPACs; interoperability of OPAC and repository software for the exchange of metadata and other information; various services offered to institutional managers, researchers, teachers and learners respectively by OPACs and by repositories; potential for improvements in the links from repositories and/or OPACs to other institutional services, such as finance or research administration; and development of possible further beneficial links between library OPACs and institutional repositories.
Executive Summary
Context
What is an Institutional Repository (IR) and what should be its role? Library Management Systems (LMSs) and their Online Public Access Catalogues (OPACs) have traditionally been used by HE institutions to provide information about the publications and other bibliographic output of the organisation. Although this role is well-established, the development of IRs potentially conflicts and overlaps with these functions.
Library and repository systems are undergoing intensive development to adapt to the demands of a digital, networked environment, taking into account changing technologies, user behaviours, and research and service agendas of institutions - yet links between these and other HEI systems are often non-existent or inadequate.
There is little evidence that the range of institutional, departmental, subject or format specific IRs and OPACs used in UK HEIs are being developed with shared workflows, re-use of data, or other considerations of efficiency. This inhibits the development of services such as the production of management reports or publications lists for research assessment and planning. It also inhibits internal and external end users wishing to easily access the institution’s rich range of documentary output. Interoperability between systems is a key mechanism for improving this situation. This is underpinned by accurate metadata, the re-use of data and, where possible, open standards. These allow, for example, the derivation of system and function specific information from a single master record. Fragmentation must be avoided. Interoperability should be built into the workflows of relevant institutional systems, including administrative processes dependent on such metadata; all institutional stakeholders managing information systems should be involved in discussions about how this can be achieved.
The aims and objectives of OCRIS were to:
-
Survey the extent to which repository content is in scope for institutional library OPACs, and the extent to which it is already recorded there
-
Examine the interoperability of OPAC and repository software for the exchange of metadata and other information
-
List the various services to institutional managers, researchers, teachers and learners offered respectively by OPACs and repositories
-
Identify the potential for improvements in the links (e.g. using link resolver technology) from repositories and/or OPACs to other institutional services, such as finance or research administration
-
Make recommendations for the development of possible further links between library OPACs and institutional repositories, identifying the benefits to relevant stakeholder groups
Key findings
Interoperability and services
-
Interoperability between IRs and LMSs in UK
HEIs is currently rare - only 2 percent of questionnaire respondents state that their systems definitely interoperate, with a further 14 percent stating that interoperability is pending
-
Interoperability of either of these system types with some type of other institutional system is moderately high, and is slightly higher for LMSs than IRs
-
Interoperability between LMSs or IRs and a range of other institutional systems is limited. It cannot be said that interoperability is substantial or that a wide variety of administrative systems interoperate with any individual library system
-
The
REF has clearly been a factor in the establishment or consideration of interoperability between Institutional Repositories and other administrative systems
-
Services stemming from library systems are limited and narrow, excepting the generation of usage statistics and metadata enhancement services
-
The generation of reports for specific administrative departments is not a common service offered by either IRs or LMSs
-
The most popular service offered by IRs remains 'advice on Open Access' suggesting perhaps that they are still in their infancy, still require explanation thus have yet to spread their wings in terms of widening their range of services
-
The use of metasearch/linking tools as well as web services and APIs is moderately popular within LMSs and IRs; the data gathered is not sufficient to discern why or what these tools and services are being used for
Duplication and scope
-
There is significant scope overlap (81 percent) for all item types held in IRs and OPACs
-
The scoping distinctions and boundaries for IRs and OPACs are becoming increasingly blurred, with many IRs containing bibliographic data and OPACs containing links to full text
-
Duplication at both record and item level is frequent, especially for print/electronic copies of theses or journal articles
-
For many OPACs and IRs, any type of content is in scope, regardless of the items currently recorded or held
-
Links between print (OPAC) and electronic (IR) copies of theses are frequently instantiated within both systems
-
Linking for content other than theses is not common. However, within some IRs link resolvers are beginning to be used to direct users to related library holdings
-
Some
HEIs are choosing to expose both OPAC and IR data through the use of Resource Discovery Platforms (RDPs) which offer 'vertical search' functionality. The popularity of these systems seems likely to increase
Authority control and description
-
Authority control within LMSs is high; format and content standards are well supported, with MAchine-Readable Cataloguing (MARC) standards and Library of Congress Subject Headings (LCSH) the most commonly used alongside local authority lists
-
In IRs there is little authority control for subjects, and only a moderate amount of effective classification. In-house lists are predominantly used for the construction and maintenance of name authorities
-
Within Institutional Repositories standards are not applied adequately and often not at all
-
There are frequent inconsistencies and a lack of completeness in statements made on both IR and OPAC Web pages about item types and scoping
-
Modifications allowed to the item and format fields of DSpace or EPrints software, and the presentation of administrative or outdated terms by LMSs within their search/browse lists, undermines consistency, standardisation and clarity for end users, across the sector
-
The use of vocabularies and standards within library systems in any given HEI is fragmented and disjointed; there is little commonality in resource description
-
LCC is frequently used within EPrints repositories as the top levels come bundled with the software; however there appears to be some confusion in the IR community about the distinction between Library of Congress Classification (LCC) and Library of Congress Subject Headings (LCSH). This is clearly significant, suggesting limited knowledge and expertise of professional standards within the IR community
-
There are many benefits in recognising the Author/Creator/Person field as the metadata element common to all internal HEI systems
-
There is an increasing awareness of the role that could be played within Institutional Repositories by institutional IDs
-
There will be an increased role for institutional (or even international, if one is to be more ambitious) personnel and group identity management schemes to enable authentication, access to services and the gathering and compilation of data for internal and external purposes. Supporting this can become time-consuming as HR-produced codes or the relationship between a department and the University hierarchy may be subject to constant change. Keeping records up-to-date can therefore be an ongoing challenge.
-
Resource Discovery Platforms (RDPs), because of their nature (for example, the use of 'tag clouds') reveal inadequate metadata in catalogue records more readily than do other interfaces.
-
RDPs are only a partial solution to making all relevant items visible to users in one place; they may present search results in ways not best suited to the scholarly needs of higher-level end users and do not necessarily reveal the richness of a library's collections.
Flexibility and working practices
-
The fragmentation or disconnectedness of
HE information systems puts a strain on the abilities of cataloguers/bibliographic services staff to work effectively across both IRs and LMSs or for shared workflows to be developed across departments
-
Lack of resourcing puts a strain on the abilities of cataloguers/bibliographic services staff to work effectively across both IRs and LMSs or for shared workflows to be developed across library departments
-
Administrative staff and the systems with which they work are not sufficiently involved in crossdepartmental collaborations with library systems staff
-
Batch processing is still considered an effective way to share data between non-library and library systems (particularly HR systems and LMSs)
Recommendations to Higher Education institutions
-
Expose all LMS and IR records for harvesting and linking (except in cases where legal requirements restrict such data re-use) via distributed/federated/meta search using technical protocols such as OAI-PMH, Z39.50, SRU/SRW or link resolvers, as appropriate to the technical infrastructure
-
Improve co-ordination between all departments possessing institutional information-gathering systems and their staff, with support at the highest levels of the institution, in order to develop efficient workflows, reduce un-necessary duplication of effort and formalise collaboration
-
Align the systems of both libraries and administrative departments, and their attendant dataprocessing practices, more closely
-
Consider establishing a centralised system and attendant workflows for cross-checking and cleaning metadata that is to be shared between systems, to ensure quality, usability and reusability by both internal and external service providers
-
Consider options other than batch processing (such as web services or applications, underpinned by open standards) where administrative departments are sharing data with library systems
-
Develop clear policies on the scopes and uses of IRs and OPACs
-
Present clearly, comprehensively and comprehensibly, to both staff and end users, the scopes of IRs and OPACs
-
Develop a single scheme for describing item types/formats and scope within OPACs and IRs, with interoperability requirements and local needs fully accounted for. This scheme should be tested with a variety of users (teaching staff, researchers and undergraduates) to ensure it speaks to their needs and is understood by them. If JISC act on recommendation A above, library staff should be allowed to attend meetings and contribute to the formulation of a cross-institutional scheme, with local needs discussed as part of the activities of the group
-
Ensure the use of format and content standards within IRs to avoid the need for future 'retroconversion' or 're-keying'
-
Support the interoperability of subject authorities across institutional systems if common ones are not appropriate – this might build on the work of existing mapping and switching projects such as OCLC's Terminology Service Pilot and the High-Level Thesaurus Project (HILT).
-
Use interchange formats and cross-walks based on open standards more widely and extensively, to assist in the sharing and exchange of records conforming to different format and content standards.
-
Reassess the use of Library of Congress Classification (LCC) within IRs; staff should be familiar with the distinction between LCC as a classification system and LCSH as a subject heading system in order to determine whether these schemes meet the needs of their users and whether they accurately reflect their repository collections/items.
-
Establish controlled name authority lists for staff throughout the institution using agreed, recognised standards, to be made available to all relevant departments.
-
Develop (or if already in place, make consistent use of) persistent, institutional or departmental IDs, making these available internally and to other institutions. Relevant institutional systems could possibly hold these IDs in the form of a flat file. This would allow data relating to specific individuals to be 'pushed' and 'pulled' between various systems. The IDs would become the 'glue' allowing information to be disambiguated. These IDs should be built into metadata workflows and be usable by both staff and end users. Person and role information from various institutional systems should be 'warehoused' and made available as 'a rich source of contextual metadata' (Green, 2007).
-
Recognise that LMSs are a rich source of bibliographic information on books, book items, monographs, conference proceedings and other items authored by institutional staff (which may not be recorded in an IR) hence an interoperable LMS could be leveraged for research assessment data gathering activities or for use within Research Management Systems