One of the goals of the Preservation of Web Resources Handbook is to make current trends in digital preservation meaningful and relevant to information professionals with the day-to-day responsibility for looking after web resources.

Preservation of Web Resources Handbook

The Handbook is aimed at an audience of information managers, asset managers, webmasters, IT specialists, system administrators, records managers, and archivists.

Aims & Objectives

Preservation of your Web Resources

Chapter 1: What do we mean by preservation?
Chapter 2: What's on your web?
Chapter 3: What risks and issues are peculiar to websites?
Chapter 4: What are your web archiving requirements?
Chapter 5: Selection
Chapter 6: Web Capture: what and how?
Chapter 7: Tools for the job
Chapter 8: Web Content Management Systems (CMS)
Chapter 9: What approaches and techniques can you use?
Chapter 10: Who wants to keep what, why, and for how long?
Chapter 11: How do we appraise the value of a web resource?
Chapter 12: What about Web 2.0?
Chapter 13: Scenarios and case studies

Influencing the Institution

Chapter 14: What are the drivers for web archiving?
Chapter 15: Some personal perspectives on web preservation
Chapter 16: Responsibility for preservation of web resources
Chapter 17: Institutional strategy
Chapter 18: What policies exist?
Chapter 19: How can you effect change?
Chapter 20: Information Lifecycle Management: Creation
Chapter 21: Can other people do it for you?
Appendix A: Legal Matters
Appendix B: Records management: A guide for webmasters

One of the goals of The Preservation of Web Resources Handbook (PoWR) is to make current trends in digital preservation meaningful and relevant to information professionals with the day-to-day responsibility for looking after web resources. Anyone coming for the first time to the field of digital preservation can find it a daunting area, with very distinct terminology and concepts. Some of these are drawn from time-honoured approaches to managing things like government records or institutional archives, while others have been developed exclusively in the digital domain.

PoWR workshops

The Project ran three workshops, run on 27 June (London), 23 July (Aberdeen) and 12 September (Manchester). The workshops, organised by UKOLN, were a mixture of presentations and break-out groups, where a great deal of useful discussion took place and many ideas were generated. Much valuable and interesting input was gleaned from the mixture of professionals who participated, including people from a records management background, webmasters, and other information professionals with an interest in web preservation, or experience of the difficulties and issues.

The PoWR blog

We built the blog at the very start of the project in April 2008. Several key chapters of the Handbook originated on this blog, many of them starting life as a series of what-if scenarios or actual case studies, focusing on various challenging aspects of web content and the actual use made of systems in an HFE context. The resulting discussions and comments gave us a great deal of content to assess and assimilate.

The Handbook

The Handbook, written by ULCC staff, is a distillation and synthesis of the material gathered via workshops and blog; it also draws heavily on the expertise of the PoWR team in the areas of website management, records management, digital preservation, etc. The Handbook aims to provide suggestions for best practice and advice aimed at UK higher and further educational institutions, to enable the preservation of websites and web-based resources.

We want the Handbook to be accessible and practical, and the content has been structured, as far as possible, as a narrative, starting with familiar ideas and issues, and moving towards more complex issues.

The Handbook is structured in two parts. The first part deals with web resources and makes practical suggestions for their management, capture, selection, appraisal and preservation. It includes observations on web content management systems, and a list of available tools for performing web capture. It concludes with a discussion of Web 2.0 issues, and a range of related case studies. The second part is more focussed on web resources within an Institution. It offers advice about institutional drivers and policies for web archiving, along with suggestions for effecting a change within an organisation; one such approach is the adoption of Information Lifecycle Management. There are separate Appendices covering Legal guidance (written by Jordan Hatcher) and records management. The Handbook also contains a bibliography and a glossary of terms.

The Web landscape

Sometime in the mid-1990s, institutions everywhere will have set up a web server in their Internet domain. At first it was probably a few pages of perfunctory contact details and an institutional overview. In some cases, departments and individuals may have been able to create their own sites in sub-directories in the main domain. Since then, everything about the web has grown phenomenally. Expectations of both design and content grew, both for external publicity on the Website, and internal information management on the Intranet. The Web has become the platform and interface of choice for virtually every kind of information system: anything that cannot be found on or through the web is in danger of never being discovered at all.

The kind of web resources that the PoWR Handbook is addressing are still the many diverse, and much more sophisticated descendants of those early web objects. This includes many objects commonly managed in a web CMS, whether available externally or just on the Intranet. Objects may be common native web objects (HTML, CSS, JPG), or other commonly disseminated formats (PDF, DOC, MP3, PPT). They may be database-driven blogs, wikis, or data resources. They may have URLs within the Institution's main domain, or in a subdomain, or within a third-party domain that may be paid-for or not.

The Handbook does not, however, directly address the preservation of:

  • Management information systems which use a web-interface (e.g. Agresso finance system, room booking systems)
  • Library, record, archival and administration systems that manage a well-defined class of resource, like an Institutional Repository (IR) or Document Management System (DMS)
  • Virtual Learning Environments
  • Online assessment systems (including e-Portfolios)

The reason for this is that we see these systems as hermetic and essentially self-managed by their professional user base (librarians, finance departments, teachers and learning technologists). A preservation policy, or an Information Asset policy, must encompass all web resources, including these; but the data in these systems will generally be less at risk than less strictly controlled content on the web. In many cases, these classes of system can be considered highly specialised types of Content Management System, increasingly vested with Web 2.0 features, and therefore much of the advice about CMS and Web 2.0 will be relevant.

Web management and records management

The JISC-PoWR workshops have revealed that web managers are likely to see their main responsibility as being to their users – keeping online systems useful, usable and up-todate. That alone requires a lot of running just to stand still. In addition to changing technology and standards, and ever-greater demands from creators and consumers of information and publications, there is also an ever-changing regulatory and legislative environment, which may require a complete overhaul of the design of the system. Therefore, experience suggests that, perhaps more so than in the library or accounts department, preservation management issues, slip easily off a Web Manager's radar, if ever they were there in the first place. Yet as a result many valuable institutional resources, and records of them, may be at risk of not even being considered for preservation, let alone preserved.

The PoWR project workshops and blog discussions have highlighted some of the cultural and intellectual differences between the aims of records managers and webmasters. This characterisation can imply that they might even have mutually exclusive aims and priorities. Web managers are portrayed as being interested solely in delivering content and information to a community of users and consumers, and want to keep abreast with technological developments - perhaps at the expense of preservation. Conversely, a records manager might like to capture or manage some web-based outputs, but doesn't know how to do it, is afraid of digital records and rarely communicates with institutional IT staff.

This distinction however tends to oversimplify the case. Records managers don't have all the answers, they aren't necessarily interested in preservation (archivists do that), and even the best records management programme in the world won't address all web preservation issues. Conversely, leave the management of everything solely to the webmaster and you may risk losing valuable resources. The message is that, if we are to achieve optimal longevity and security for all our web resources, records managers must change, as must webmasters.

Permanent preservation

Some reviewers have commented that JISC-PoWR has not dealt in detail with the permanent preservation process, as set out in the OAIS standard. Outside of the fact that it would probably require a second Handbook to do so, there were several other reasons for this decision:

  1. The specific aims and objectives of the JISC ITT were to raise awareness of preservation issues amongst the web manager community, establish reasonable strategic principles, and to lay out practical steps necessary to ensure that web material remains accessible.
  2. Even the international preservation community has yet to engage fully with the issues specific to the permanent preservation of web objects. This would have to include a detailed study of the file formats and file types used on the web, as well as their dynamic behaviours, before attempting to formulate preservation strategies for them. Outside of the ARC/WARC formats, which are really just containers for crawled copies of web pages and their associated metadata, we are not aware of viable, fully developed solutions in this area.
  3. The OAIS model is better understood than it was five years ago, but is still in the process of gaining acceptance within the preservation community (such as national libraries and archives). OAIS is not necessarily gaining ground within HFE Institutions within the UK. The Handbook's emphasis has therefore been on selection, management, and capture. In the event that an OAIS system is implemented, Web-based material will form itself into better Submission Information Packages and Archival Information Packages (SIPs and AIPs, in OAIS terms) when selected and managed in this way, and hence be much more easily prepared for long-term digital preservation.
  4. Websites and web resources, and the tools used to create them are changing rapidly: Web 2.0 presents both new challenges, as well as some old challenges in new clothes; the Digital Preservation community continues to revisit its understanding of what preservation means when thinking about Web resources, and highlight the importance of different aspects, like continued access, continuity and persistence.

None of this is intended to imply that preservation is out of scope, but the Handbook takes the view that resources must be captured, managed, and selected before they can be preserved, and the Handbook is designed to assist with establishing these first steps.

Download the handbook below

Documents & Multimedia

Bookmark and Share
Summary
Author
University of London Computer Centre (ULCC) and UKOLN
Publication Date
31 October 2008
Publication Type
Programmes
Services
Topic
Strategic Themes