- Home
- » Publications
- » Linking UK Repositories:Technical & Organisational Models to Support User-Oriented Services Across Institutional & Other Digital Repositories
Linking UK Repositories:Technical & Organisational Models to Support User-Oriented Services Across Institutional & Other Digital Repositories
The JISC commissioned the project partners to undertake a scoping study whose aim is to identify sustainable technical and organisational models to support user-oriented services across digital repositories. Open access repositories of interest to UK further and higher education communities were cited as having particular relevance. The study is intended to inform strategies to support access and use of repositories, with a view to the establishment of a national repository services infrastructure or framework.
Executive Summary
A number of lessons and insights have been identified from previous or ongoing studies and from expert opinion that would have significance in any scheme that links UK repositories. The main ones are:
- At the ingest level: technical capabilities vary widely across institutions as a result of which there is huge variation in the quality of metadata provided by repositories, in the preservation activities being undertaken at repository level, and in the systems in place to capture content. The amount of content in repositories also varies hugely: advocacy work to the author community is critically important in raising the levels of deposition of research postprints. IPR and copyright remain major stumbling blocks in this respect. Some of these obstacles can have a strongly discouraging effect on repository managers seeking progress. They also mean that the volume of Open Access material available for services to use remains low
- At the aggregator level: metadata quality – or even metadata provision itself – remains the major issue. The technical model proposed in this study describes the optimal approach in this respect (see below)
- At the output level: specialised resource discovery tools are important and provide a route into repositories for users with specific needs, though users may enter repositories in various other ways, too. They may search a specific repository because they are looking for something they know will be located there. They may use subject based portals if they are searching a specific discipline or topic, or portals that aggregate repository content by object-type of interest (such as moving images or theses). In many cases they will arrive at repository content via Google or other web search engines. Repository managers and authors also value the exposure that Google and other web search engines bring to their content and these and it is desirable that these be factored into a national scheme, too
The elements of a national linked-repository landscape and the candidate services that would be needed are identified as:
Ingest level
- digitisation services
- services that provide advice on IPR and rights
- services that provide advice and advocacy materials on Open Access
- services that provide help on technical issues
- repository construction services
- repository hosting services
Data level
- institutional repositories
- national-level ‘catch-all’ interim repositories for authors with no institutional repository
- subject-specific repositories gathering primary content
- media-specific repositories gathering primary content
Aggregator level
- Metadata creation and enhancement services
Output level
- access and authentication services
- usage statistics services
- preservation services
- research assessment and monitoring services
- resource discovery services
- publishing services
- overlay journal services
- meta-analysis services
- bridging and mapping services
- technology transfer/business advice services
With the exception of the very last point in the list, all these activities are carried out to various degrees by existing services or projects, or are currently being scoped. Many are operating in bounded, discrete areas or as demonstrator projects, however. Scaling up such pilot or project-level activities to the level required for a workable national scheme will require careful planning and a strong leadership role from the JISC.
The following have been identified as top priorities:
- interim ‘catch-all’ repository (or repositories) for authors whose institution does not yet have a repository
- national resource-discovery service
- meta-analysis services, specifically citation analysis and bibliometric analysis services that can inform future national research assessment exercises
- repository usage and statistics services
Second-level priority should be given to:
- preservation services working across areas not already benefiting from specialised services
- a national name authority service
- a national file format/conversion service
An aggregation model is proposed to support the development of end-user services. This model builds on previous recommendations of harvesting using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) as a preferred technical approach. However, the breadth of potential content within open access and other repositories required both a closer examination of OAI-PMH’s capabilities and the identification of additional standards/technologies that might be used to achieve the same ends.
Aggregations bring metadata, and potentially, content together as a basis for end-user services and negate the services having to deal with many individual repositories. Aggregations offer greater control over the data so this basis is a stable one, whilst leaving full control over the content to the originating repository. Regular aggregation permits efficient and up-to-date access for end-user services to build on. Most valuably, aggregations allow re-factoring of the metadata/content to make it better suited for supporting end-user services than working across individual repositories.
Metadata and content
At the heart of all end-user services across repositories is good quality metadata about the digital content repositories hold. Automated generation of metadata in all its forms is an area requiring additional activity, but also lateral thinking to identify where metadata can be generated. Generation should lead where possible to the creation of a rich base metadata record that can be used by a repository internally whilst acting as the basis for externally-facing metadata formats for exposure to aggregators. The exposure of content for aggregation is less well understood than exposure of metadata. There is a need to model exactly what we wish to do when exposing content so that the technology best suits these requirements. A modelling approach will also be of value in determining the use and granularity for assigning identifiers to digital content and relevant sub-components so that aggregators/end-user services can clearly identify what they are working with.
Repository interfaces
OAI-PMH is used extensively to facilitate access across repositories through OAI service providers. The use of sets and other protocol containers can enhance its value and allow service provider aggregations to better focus what they offer end-user services. RSS and ATOM newsfeeds are mini-aggregations in their own right from individual repositories, whilst RSS/ATOM readers act as aggregators for newsfeeds from many repositories. Web crawlers aggregate available web page information and present this usually through web search engines. These two alternative approaches offer different paths to enabling an aggregation upon which end-user services can be built.
Aggregation and end-user services
Once compiled, aggregations can act as the basis for metadata generation, and offer a more viable option for the processes involved through economies of scale. Aggregations themselves will not normally act as end-user services directly, but rather provide a range of interfaces to enable end-user services to be built on top. This may involve re-exposure via OAI-PMH, RSS or via a web crawler for further aggregation. Once exposed through an end-user service, though, there should always be access back to the originating repository so that additional functionality can be offered and full content sourced.
Architectural approaches
The three main components of the aggregation model are repositories, aggregators and end-user services. Repositories are likely to be independent of aggregators: end-user services, though, are often closely associated with aggregators, though there is an increasing shift to separating these two (e.g., through Web 2.0 approaches). A shift toward viewing the three components as services can facilitate a move towards a service-oriented architecture that can provide maximum flexibility in how the components are implemented. Two specific instances of how the components can be linked are the aDORe and CORDRA initiatives. Both promote the concept of exposing as rich a metadata set as possible to facilitate aggregation and the development of end-user services across repositories. aDORe has practically demonstrated many of the CORDRA concepts and both can benefit future development.
Looking ahead
Communication between the components of the aggregation model is vital to the development of effective end-users services. This can underpin the development of more personalised services that end-users themselves require to suit their varied roles within education and research.
Business models for repository services
Repository services might adopt a range of appropriate business models. Here we focus on five:
- institutionally-supported: appropriate for digitisation, repository provision, preservation at some levels and overlay journal production
- publicly-funded (e.g. from top-sliced money allocated by the JISC): appropriate for all advisory services for interim ‘catch-all’ repositories, metadata creation and enhancement, resource discovery, technology transfer and bridging services
- community-supported: appropriate for subject- and media-specific repository provision, usage, assessment, and meta-analysis services and publishing services (particularly where mediated by learned societies)
- subscription-supported: appropriate for access and authentication, preservation and resource discovery services
- fully-commercial models (including advertising-supported, merchant and utility models): appropriate for digitisation, repository provision and hosting, technical advisory services, metadata creation and enhancement, technology transfer, and all output-level services (access/authentication, usage statistics, preservation, monitoring and meta-analysis services, resource discovery, bridging services, overlay journal production and publishing services
The highest costs are likely to be incurred by preservation and access/authentication services. Resource discovery services and metadata services will have medium to high costs. Repository provision and hosting, digitisation, usage statistics, bridging services and publishing services could operate at a medium-cost level. Advisory services, monitoring and meta-analysis services, technology transfer, subject-specific and interim repositories, and overlay journal services would be expected to be able to operate at relatively low cost.
Recommendations
The full list of recommendations made to the JISC is as follows:
- The research community should be engaged at the highest level to encourage the establishment of repositories in all HE and FE institutions and the development of policies that will ensure the collection of content
- Channels of communication with repository managers should be opened, and the establishment of a community encouraged. This may be done through existing structures: the UKCORR is the most appropriate, and the two main open source repository softwares (EPrints and DSpace) have their own user communities that could also be used for this purpose. The aim is to have clear and effective communication structures in place between JISC and all operating repositories that will facilitate two-way discussion and enable development
- Similarly, an interface or contact point between the JISC and actual or potential service providers should be established. This will enable end-user oriented services to be developed in a coordinated and directed way
- Developments of repositories, aggregators, end-user services, and intermediary services should move towards a service-oriented architecture and establish separate layers for the aggregation model to maximise the flexibility available for building end-user services to meet user requirements
- Development of end-user services includes an element of investigation of how information to be surfaced through these services will be used. This will assist in helping inform the development of the service and feed back to the underlying repositories being exposed through the service
- Additional means to generate metadata using automatic means are required. It is recommended that investigations into relevant techniques and tools be taken forward with some urgency
- Further attention to identifiers, specifically location-independent identifiers, and necessary resolution systems is recommended to provide greater understanding of their benefits and use
- It is recommended that the use of RSS and ATOM be investigated as additional standards to OAI-PMH for use in aggregating metadata and content. They offer the potential of targeted exposure of repository resources that may be beneficial in the development of end-user services targeted at specific communities. It is also recommended that the exposure of repository contents within web search engines be examined in closer detail to assess the paths of exposure that exist and the implications for repositories of exposure via this route
- It is recommended that future work to develop aggregators and/or end-user services include an element of communication and involvement with repositories from the start. This will ensure development does not take place in isolation and increase the interoperability between the three major components of the aggregation model. Where intermediary shared infrastructure is involved those developing this should also be included in relevant communications
- It is inevitable that for an optimally-structured set of repository services to be developed on UK repositories, there will be a continuing need for top-sliced funding for some parts of the system. The JISC will need to plan for this for the medium-to-long term