A report investigating the complexities behind version identification of material in repositories. That’s to say, how to correctly label each version (be they drafts or different versions of datasets) and how to correctly identify the ‘final’ most complete version. That’s the one that most accurately represents the “published version”, not necessarily the one that was completed last.

Scoping study on repository version identification

Sally Rumsey & Frances Shipsey (LSE) Michael Fraser & Howard Noble (Oxford University) Mark Bide, Hugh Look & Deborah Kahn (RIGHTSCOM) March 2006

Report summary

A report investigating the complexities behind version identification of material in repositories. That’s to say, how to correctly label each version (be they drafts or different versions of datasets) and how to correctly identify the ‘final’ most complete version. That’s the one that most accurately represents the 'published version', not necessarily the one that was completed last.

The report begins by defining a number of terms and playing around with semantics relating to different versions of material to be deposited in repositories. The authors bring out the interesting point that the confusion and difficulty surrounding these terms (especially the near legalistic term “version of record” for the best version of an item in an archive) gives a good idea of some of the complexities involved in working out the significance and state of finish of many items that are placed in repositories. Although even a fully formalised labelling scheme doesn’t fully address the central question of how to work out a 'version history'.

They then go through a series of 36 scenarios the repository version identification project  came up with at a workshop to highlight potentially tricky areas such as: how to label versions of a digital images; how to cope with dynamic ever changing group wikis of projects; whether to regard photographs of the same item that are considered to fulfil the same function as different versions of each other. They then posit potential solutions.

The next section looks at current labelling practice in various repositories and then there is a section on metadata harvesting protocol.  There is also a more detailed investigation of how the solutions they proposed to the various scenarios can be enacted, with reference to current practice.

They recommend further dialogue about repository identification, awareness raising, more detailed analyses o the requirements of identification within repositories, the formulation of a set of consistent semantics.

Key points

Researchers

Authors’ primary interests in version identification are likely to be to:

  • ensure collocation of all the versions of their creation that they wish to disseminate; this is likely to be particularly significant in managing processes such as citation analysis and other measures, which are particularly important to most authors
  • ensure that their moral right of paternity is properly exercised – in almost all circumstances, authors expect to be recognised as such

“In some circumstances, ambiguous, misleading or unreliable  metadata may be worse from a user point of view than no metadata at all.”

Institutions

“It is unclear whether it should be the responsibility of a repository to maintain a record of all the network locations of digital copies of resources. Such a service would seem to be best abstracted as a distinct shared service registry.”

“We recommend that IR managers should consider the implications of introducing version identity management policies as early as possible in the development of their IRs…”

“We recommend that each IR should develop and implement guidance statements for repository users (both depositing users and readers/ researchers); and should provide the greatest possible support to depositors in following the guidance given (either through the use of very simple but intuitive user interfaces, and/or through expert human mediation).”

Read the full report (PDF)

Bookmark and Share