Lessons learned and advice for future digitisation projects
The work of these first round projects has helped in generating lessons learned that other projects may benefit from. The experiences of the projects further go to emphasise that successful mass digitisation projects are a complex undertakings and that projects in this are essentially developing a publishing operation and need to consider all the associated elements.
Lessons learned
General advice
- Ensure that any digitisation project is compatible with the host organisation's wider mission and will attract high level support within the organisation. The importance of a digitisation project being compatible with the organisation's wider mission was stressed by several projects. It was felt that if funding was sought without a high level of institutional backing, there was a strong possibility that the resources would not be sustainable beyond the end of the project funding. The British library projects, for example, were linked to this organisation's drive to make its collections more accessible and the BOPCRIS project tied in with the strategic aims of the University of Southampton and the interests of the university library.
- There is a need to involve individuals with a knowledge of the collection as well as those with technical expertise on project teams.
Procurement
- Ensure that adequate time and resources are allowed for any procurement procedures which may have to be undertaken as OJEU procurement processes. This can demand considerable time and resources. In some cases the expertise required to assist with the procurement process may exist elsewhere within an organisation or even outside an organisation.
- Make use of purchasing and legal specialists who can assist with the procurement process
- When considering whether to outsource particular activities, consider the advantages and disadvantages. For example, outsourcing may mean skills, funds and equipment are not retained within the project, but may prove to be quicker, easier and more cost effective. If choosing to outsource elements of the digitisation workflow, then it is vital to put in place detailed agreements on quality control which have been rigourously piloted and tested with suppliers.
- Regular, close communication with suppliers is essential. One example of an effective communication tool is the conversion system design document used by the Medical Journal Backfiles project
- It is important to lay down strict parameters for suppliers and link these to censure measures. More informal arrangements can be quicker to implement, but lead to difficulties in the longer term.
- Although tendering saves time in the short term, a framework contract can be an advantage in the long term as the specific requirements are clarified early on
- An in house team may be more willing to respond to the needs of the project and users than a commercial supplier.
- It can be difficult for smaller suppliers to meet the level of reporting required by digitisation projects. Projects should ensure that adequate agreements on reporting requirements are in place with suppliers from the outset.
Quality Assurance (QA)
- Ensure adequate time for quality assurance procedures; it is important to retain control over quality at all stages of a digitisation project. UKOLN provides useful advice and guidance on the quality assurance process. The QA Focus website provides a wide range of resources designed to support those involved in the development of digital services, including those ranging from those involved in project activities to web developers with a responsibility for managing web and resource discovery services. The resources available on the website include a wide range of briefing documents and a selection of case studies .
- It is important to ensure that quality assurance processes are well-documented and agreed before work starts. If using an external supplier a sample of material from a supplier should be quality checked early on, and any issues identified, before mass production commences.
- Obtaining high rates of accurate optical character recognition (OCR) fir certain materials may be especially challenging. Projects may need to use multiple approaches in order to improve the OCR success rate The BOPCRIS project, for example, found that achieving a high level of corrected OCR was a challenge for the 18th century materials that were being digitised. This was due to the nature of the orthography and inconsistency of printing throughout the period. A range of approaches were used to improve the OCR accuracy. This included trialling AbbyOld English Finereader and undertaking a pilot project to apply text clean up and mining techniques to names in documents.
- There is a possibility of 'ghosting' in film digitisation which needs to be considered as part of the quality assurance process. The issue is one of 'field dominance' when frames of film are scanned into a video signal as film is transferred to videotape - and the subsequent effect as compression artefacts.. The Newsfilm Online project found that content originated direct as a video signal did not show the effect in subsequent transcoding. On the other hand, content which had been originated on film, then transferred via telecine to videotape for subsequent encoding, did have a propensity to show the problem - but not in all files.
- The tools and methods developed by the first round of digitisation projects may be useful for future projects, namely:
- The "Issue Tracker" a Web based tool which enables tracking of quality assurance issues (developed by the Online Historical Population Reports project and available as sourceforge).
- An Image Batch Validator which ensures that file naming and directory structure conventions are adhered to (developed by the Online Historical Population Reports project ).
- A Batch Comparator which highlights changes between a batch release and the previous release (developed by the Online Historical Population Reports project).
- Table of Contents Validator which ensures the document tables of content are consistent with supplied images before the database is populated (developed by the Online Historical Population Reports project).
- The workflows and team approach to quality assurance working with an external supplier developed by the 19th Century British newspapers project.
- A quality plan monitored through the means of an extensive digitisation matrix developed by the Archival Sound Recordings project
Metadata
- It is important to design a database with the end user's requirements in mind, for example considering the level of granularity required in metadata for users
- For projects digitising audio, the work done by the Archival Sound recordings project to create the British Library Application Profile (BLAP) for sound, known as BLAP-S for sound (BLAP-S) and define a Unique Identifier (UID) specifically for sound may be useful to consider when planning metadata creation.
- The application of mark up and metadata schemes can be complex. Projects may be employing or adapting metadata schemes which have not been scaled up to such an extent before. At the outset projects should agree a metadata scheme and may need to adapt an existing scheme to suit specific project needs. The balance between the choice of metadata and the interoperability of data should be considered
- At the present time, standards are still being developed in some areas and may not be adhered to by some commercial suppliers. Projects dealing with external suppliers should agree at the outset the level and nature of metadata to be used.
Indexing
- Compromises need to be made between the level of detail of indexing of digitised resources and the resources available to undertake indexing work.
- Cataloguing materials can prove more time consuming in practice. In the case of the Archival Sound Recording project, existing records were inadequate and the material to be digitised need to be listened to and properly catalogued as a first stage of the project. BOPCRIS has learnt a great deal about how indexing and scanning teams interrelate. The project has mixed mechanised and manual indexing methods to create an index which meets the needs of users.
User consultation
- User involvement is essential, especially at key phases such as content selection and interface design. Projects should consider user input from the outset. This may include assessing users needs in relation to the final selection of content and the nature of contextual material to be provided.
- End user needs should drive the design of the web interface; user consultation is essential to inform this
- An advisory/steering group with representatives from the HE and FE sectors can be an effective way to maintain close connections between the project and the user community.
- Consider whether users will want to repurpose your materials and whether you need to provide any tools to allow them to do this
- When digitising media which is less familiar to many potential users, consider establishing an outreach post to facilitate communication with potential users and encourage take up of project outputs. Newsfilm Online is an example of a project employing this approach.
- Consider the likely public response for the project outputs early on. This may include planning services to cope with a high level of demand and ensuring systems can support this.
Intellectual property Rights (IPR)
- Allow adequate time and resources for the complex process of IPR negotiation. Some phase one digitisation projects found that in practice this took longer than was anticipated at the outset.
- IPR considerations may affect the range of materials you choose to digitise eg date range; it is important to determine this at an early stage
- For sections of visual resources for which rights cannot be easily obtained, one solution is to 'fuzz' the sections as NewsFilm Online have done
- For the digitisation of audio resources, the work carried out by the Archival Sound Recordings project to clear rights for education use may be valuable as a model to follow
Project management
- A risk register or log is important to help you manage risk. You may also want to consider establishing a risk panel. This is a model that was effectively employed by the 19th Century Newspapers project.
- The value of effective project management techniques cannot be overemphasized. Professional models of project management such as the PRINCE 2 (PRojects IN Controlled Environments) methodology should be employed. JISC infoNet provides useful guidance on PRINCE 2 as well as other project management approaches.
- Projects should keep detailed and accurate records of activities and progress including lessons learned. This both aids compliance as well as learning and retention of knowledge within a project team and after the cessation of a project.
- All projects should be supported by a project board comprised of experts in the field and representatives of the user community. The project board should provide input at key stages of the project and be consulted when important and high risk decisions are being made.
- Ensuring that advisory and/or steering groups are comprised of members with a mix of technical and subject content know how is helpful as issues around content and engaging users can be addressed alongside issues around what technically feasible
- Wherever possible, projects should aim to build the delivery of some samples of their intended outputs throughout the project rather than relying on delivery at the end. This can help to engage potential audiences at an early stage. This also helps to test the processes involved and deal with quality issues with suppliers at an early stage
- Be aware that for longer term projects, the nature of relationships with partners and third party contractors can change over time due to unforeseen factors at the outset of a project. Projects should try and anticipate this as far as possible and realise that this can have a considerable impact on the project and where appropriate contingency measures should be developed.
- Do not under estimate the time required to recruit project staff
- Finding a project manager with the range of skills required can be extremely difficult. A team approach may be one way of overcoming this problem. A good project manager needs to be consulting looking ahead and anticipating potential problems.
- With digitisation of newspapers for example, initial page counts (estimates) can be inaccurate in practice so it is difficult to make forecasts based on these alone. Preparing materials for digitisation eg repairing pages is extremely time consuming
Accessibility
- Address accessibility issues as widely and as pragmatically as possible. In the cases of certain types of material such as audio and video, where full transcription is not possible consider producing a synopsis of material to enhance accessibility for the user community. Where full transcriptions are possible, the cost and time to undertake this should be built into project planning.
- There is a need to consider how/where large quantities of data will be stored beyond the project end date.
- The techdis service provides advice on accessibility issues which project may find useful.
Web interfaces
- The quality of end user web interfaces can be an important factor in the extent to which the user community engages with a particular resource or finds out about the development of a project.
- The resource discovery aspect of any website for delivering digitised content needs to be carefully considered. Usability testing is essential and should include both alpha and beta lab based testing. In addition, if possible, the web interface should be tested on a sample of the user population, which could be through easily accessible groups of potential users or through testing with user group and/or advisory group members. The process should be an iterative one. Projects should consult existing resources to aid with usability testing.
Evaluation
- Projects should consider all stages of the evaluation lifecycle from initial user consultation, usability testing, process evaluation through to post launch evaluation activities. These activities should be regularly reviewed and revised to ensure that they remain fit for purpose.
- Adequate consideration should be given to planning for post launch evaluation activities when the end products of digitisation are made available to users. This may involve drawing on existing resources such as potential users identified through the production stage or project advisory board contacts. Consideration should be given to designing web based systems that can capture valuable evaluation data on usage through web logs.
- Evaluation activities can also serve to promote project outputs. Projects should be prepared to collect evaluation feedback to inform resource development from relevant promotional activities undertaken. This could include screenings and workshop demonstrations.
Promotion
- It is helpful to consider to whether establishing a 'brand' for the digitised resource will be beneficial. Brands need to take into account the requirements of all relevant stakeholders and the host organisation.
- Involving distinguished academics in an appropriate subject discipline can be a good way to promote interest within a wider user network
- The use of a wide range of media for reaching users should be considered. This could include more traditional methods such as printed literature, websites and awareness raising events. The Online Historical Population Reports project has produced a good example of a successful promotional booklet. In addition, the use of emerging media e.g. blogs and wikis should be considered as a means to promote resources.
- Where possible robust web logs of usage should be kept and if it is possible to identify specific types of users this information should be used to target further promotional activities.
- Projects should consider developing a programme of outreach activity. This can serve to attract new potential users as well as providing teachers with advice on how to effectively embed digitised content into teaching and learning programmes.