Information Object Numbering Systems
This webpage has been archived. Its content will not be updated.
View web retention policy
When there is a requirement to identify information objects uniquely, the solution adopted is to number them. How they are numbered and the implications of the systems in use form the basis of this report, which focuses directly on objective 16 (Object Identifiers) of the JTAP development programme.
Executive Summary
The report is in two parts, dealing firstly with the various systems which are currently in operation or which have contributed to the development of those currently in operation, in broadly chronological sequence. The second part provides an overview of the various systems and of their evolution and it goes on to consider the general requirements for an identifier scheme together with the special requirements of UK HE. The concluding section makes recommendations as to how to proceed for consideration by JISC itself and the UK HE community more generally.
Historically, identifiers have had only one requirement: they must be unique in all circumstances. The reason that there are so many different types of identifiers today is simply that they have come into existence piecemeal as each sector has come up with its own solution to the problem as perceived at the time. It is possible to discern four phases in the development of numbering systems, each reflecting the increasingly complex level of identification needed at the time of their inception. We are currently in a phase of convergence primarily motivated by economic considerations. Electronic copyright management systems need concise markers for tracking transactions, while licensing the use of copyright works depends on being able to identify what is being licensed, to whom and under what conditions. A multiplicity of markers does not readily lend itself to efficiently achieving these goals and accordingly considerable work is being done to combine elements into coherent systems.
It has to be remembered that numbering systems or identifiers are only one component in the complex packages of metadata needed to conduct trade in intellectual property. The challenge is to create a framework among rights-holders for electronic IPR trading which is media independent so that companies which at present specialise in different media such as records, films, books, can trade their creations in a coherent single marketplace. Identifiers are the very bricks from which this edifice will be built since it is essential to identify at the outset what it is that is being traded.
In the UK, university libraries act as the main information acquisition channel for their respective institutions. They also serve as repositories of material which may or may not be made available outside the institution and, where it is accessible to the wider community, may or may not be made available on a commercial basis. However, although the library may be expected to play a central role in brokering educational information to the institution it is by no means the only player. The growth of electronic media means that many institutions have in hand arrangements for the development of repositories of learning material which do not in every instance relate directly to the services provided by the library. In addition to these there may well be departmental arrangements that function quite independently of any wider considerations.
Even at an individual level, members of the institution generate material for publication. Much of this will be in the form of text directed at commercial publishers but it will also include a quantity of learning material intended primarily for internal use.
Looking wider than the individual institution, JISC is in process of developing its Distributed National Electronic Resource which will incorporate material from a range of sources which may well be available on varying terms. The documentation services supported by JISC also provide access to material licensed from bodies which in many instances own considerably more, but which may be unavailable because of its format, rights issues or commercial concerns. Whatever the reason, its existence will be of interest to the community.
Wider still, there is useful material to be gleaned from the Internet and services such as ADAM and SOSIG are specifically charged with seeking it out and evaluating it for UK HE use.
What is missing is a structure which will enable an individual to sit at his/ her computer, launch a query, and be pointed to a resource which is suitable in terms of relevance, availability, and cost. Dublin Core has been developed to aid resource discovery but is still evolving and has yet to be widely implemented. Z39.50 is another possible vehicle but is probably better suited to larger resources.
Although to a considerable degree the agenda for the development of identifiers has been set by commercial interests, there have been parallel discussions in the non-commercial sector which have been broadly similar. Whereas commercial concerns have concentrated on the unique identification of intellectual property, the internet community has been concerned with the problems of addressing which include factors to do with both identifiers and syntax.
The Internet Engineering Task Force (IETF), the body responsible for the technical standards underpinning the global network, has defined a desirable set of characteristics for identifiers which do not differ markedly from those identified by the commercial sector and from a technical viewpoint are quite uncontroversial. By themselves they do not guarantee success however. The scheme adopted would also need to have widespread support and be widely implemented.
One of the IETF requirements baldly states “An identifier will be supported by a mechanism for resolving it to a URL.” A simple assertion like this glosses over the rather major issue of how this will be done. Even assuming that the mechanism is simply a “straight” translation from identifier to URL without any additional functionality, the database will be huge and require support commensurate with its size. IETF on the other hand leads a virtual existence, leaving a large question mark over how this mechanism might be achieved, let alone paid for.
A further question to be addressed is that of the relationship between digital and analogue resources. Are the identifiers only to be applied to network-accessible items or should they be capable of pointing to physical manifestations of the same work?
The primary requirement of any numbering scheme is that it should support the efficient exploitation of the material to which it relates. It cannot do this by itself but only as part of a range of tools and techniques which will include indexing, metadata and resource discovery. The UK HE community already has available to it a number of services - not to mention institutions - which are involved in these activities and is also party to a wide range of initiatives, so an early consideration must be the feasibility of a mechanism to facilitate the integration of these various interests into a well-articulated system.
A precedent has been established by the commercial sector in bringing together a range of disparate interests in order to exploit the commercial opportunities offered by the Internet and its initiatives cannot be lightly dismissed. However UK HE differs in its requirements since it is both publisher and purchaser of information. This gives rise to a certain amount of ambivalence not least because a significant body of material generated within institutions must be taken into account even though it is only intended for local use. A pattern of provision such as this suggests strongly the need for a system of resource discovery which would be capable of cascading searches in order to ensure that the request was met at the lowest level.
This model would direct the user to the most cost-effective information source in the first instance and could be further refined to take into account the availability of the sought information in caches and mirrors thus minimising network traffic. For this to work services at each level would have to generate compatible metadata including compatible identifiers to enable them to lock into the overarching system operating in the global arena.
Any system adopted by HE would need to be compatible with that operating in the commercial sector. In addition, the distinction between commercial operators and others is likely to become less clear cut as organisations increasingly seek to generate revenue by making a charge for access to their material. Developments in the commercial sector are gravitating towards universal adoption of Digital Object Identifiers. This is an extremely flexible system which, because it uses "dumb" numbers that serve only to link items to a database entry, can accommodate any other identifier system as a sub-group without the need to re-number existing items - which in any case would be an impossible task. Its strength as well as its weakness is that to operate it requires an infrastructure capable of resolving its identifiers. This means a registration system that will record the allocation of identifiers together with a resolution system that will allow identifiers to be resolved into meaningful citations and vice versa.
The citation process is potentially a very powerful tool. which could provide a mechanism for locating known material according to a pre-determined algorithm so that, for example, undergraduate students were not offered material from commercial sources involving a direct cost. It could also provide a mechanism for linking citing and cited works, offer users a choice of different versions or physical formats and even provide a means of simplifying the preparation of reading lists by linking to ISBNs etc thus minimising the problems associated with different citation styles, partial references and so forth.
In order to achieve these goals it will be necessary for the system to have available to it the metadata containing the information required to facilitate searching and meaningful resolution of "hits". UK HE has a considerable amount of metadata available to it in the form of library catalogues, reading lists, material from services such as ADAM, ROADS etc. so it would make sense if the system were to be adopted as a means of bringing these resources into a coherent structure.
The state of play then would seem to be that IDF, on the one hand, has considerable commercial interests willing it to succeed together with a well-developed rationale identifying the key elements underpinning that success while UK HE has considerable infrastructure and assets in both materials and their cataloguing/ indexing/ metadata. Joining the two into a coherent structure would seem the obvious step.
What is now required is a feasibility study to establish in the first instance just what IDF's attitude might be to a suggestion of this kind and beyond that to establish just how such a system might work at local, regional and national level. It would also need to investigate links to other initiatives such as the Library and Information Commission's plans for a national library network . The study would also have to take into account not only these strategic initiatives but also some of the structural implications such as whether using the Interested Party (IP) numbering system as a standard for identifying persons and corporate entities more widely than in the book industry is practical, and how this would impact on name authority systems in libraries.
The way forward is clearly not a simple one and any decisions made must take into account not only the numbering systems discussed above but also metadata and interoperability issues that are central to rights management systems. It is recommended that UK HE work towards a unified system of identifiers/ rights management which is not only compatible with but linked to the emerging arrangements being made by the commercial sector.