The File Format Representation and Rendering Project
The aim of the project is to support the implementation of the JISC
Continuing Access and Digital Preservation Strategy 2002-5: in particular
to contribute to the technical foundation for a long-term digital curation
centre. The key outputs of the project would also be of great use to
existing digital preservation activities within the UK. It is clear that
individual digital repositories cannot realistically develop their own
Representation and Rendering technologies. The cost and high degree of
specialised skills required to implement this infrastructure is
problematic. A project of this kind will utilise the experience of the
digital preservation team at the University of Leeds in order to address
these complex technical issues. The project will develop tried and tested
technologies, conceived by the Cedars and CAMiLEON projects.
Assessment of File Format Information
In order to provide effective preservation of digital materials it is
necessary to understand the formats in which the data is encoded. File
formats are often designed and maintained by commercial companies. The
specifications of these formats are not always made publicly available and
information on obsolete formats is rarely retained.
A survey and assessment of sources of information on file formats and
software documentation will be made at the start of the project. This will
include research into what information is available, who owns the
information and how accurate it is. Sources to be investigated include
information available on the web and in published form. Consideration will
also be given to outcomes from related work on file format information for
example in the Public Record Office PRONOM project and the Dutch Government
Digital Preservation Testbed (Digital Bewaring). A specific investigation
will be made into the availability of text and word processor file format
information. This research will form part of the initial scoping and
research for the target formats of the Migration on Request development
(also described below) as well as being formalised in a report for
publication in March 2003.
Rendering
After retrieving the byte stream of an obsolete digital object from a
repository it is necessary to render it in order to give it meaning.
CAMiLEON has clearly demonstrated the need for advanced rendering
technologies that offer accurate and economical preservation of digital
materials. Migration on Request offers the potential to dramatically cut
the costs of long term preservation while providing far greater accuracy
than traditional migration.
CAMiLEON successfully demonstrated the principles and effectiveness of
Migration on Request with a practical implementation of a vector graphics
Migration on Request tool.
The proposed project will develop a new Migration on Request tool which
will address textual data formats. The specific formats to be supported by
the tool will be decided following an initial scoping phase.
Representation
A Representation System keeps track of different formats of data, how they
are composed and structured and most importantly how they can be rendered
for use once they become obsolete. A Representation System manages the
Rendering Technologies that a repository utilises. This includes facilities
to monitor and update rendering information, as existing rendering tools
become obsolete or new tools become available. This "technology
watch" is a crucial function of a digital preservation service.
The CEDARS demonstrator showed an advanced, accurate and highly efficient
way of managing OAIS compliant Representation Information using a Network.
It also illustrated that Representation Information could be utilised in a
distributed way. However the Cedars demonstrator has a very simple
underlying design. A service quality system will require a more manageable
database and underlying structure.
For this reason the project will further explore and document suitable
procedures and infrastructure for the ongoing management and maintenance of
a representation system. Particular emphasis will be given to how it might
support technology watch functions. The proposed work will build on
existing research and development from the preservation community and will
provide the design and technical foundation for a service quality
Representation System.