Interview - Robert Kiley, Head of Systems Strategy at the Wellcome Library

The Medical Journals Backfiles Digitisation Project, which will digitise around 1.7 million pages of complete back files worth £1.25 million, is jointly funded by JISC and the Wellcome Trust who are working with the National Library of Medicine, based in the US. The digitised content will be made freely available on the internet – via PubMed Central. We interview Robert Kiley, Head of Systems Strategy at the Wellcome Library, who is managing the project from the Wellcome side. 

JISC: Could you provide me with an overview of the project itself?

RK: The project aims to digitise a number of historically significant medical journals. It came out of a consultancy study we did with HEDS. We looked at a number of collections held in the Wellcome Library – archives, images, printed books etc – to try to determine what project would benefit most people. Digitising medical journals was identified as a project that would meet the needs of significant number of users. 

The aim of the project is to identify around 15 journals - which we consider historically significant - and digitise them in their entirety and make them freely available through PubMed Central

It isn’t just the archive, however, that we intend to make freely available – but also current and future issues published by participating publishers. In essence, Wellcome and JISC agree to fund the backfile conversion and in return the publishers (as a condition of participation) have to deposit their current issues into the PubMed Central archive. Research articles deposited within PubMed Central must be made freely within 12 months of publication, whilst all other content, such as editorials, letters, or reviews, must be made available within 3 years 

JISC: Who do you feel will benefit from using the material once it has been archived?

RK: The research and the clinical communities within the UK and overseas, and medical historians are the key audiences.  For example, if you want to understand today’s MMR autism scare, you have to look back to the medical literature of the 1940s and 1950s to really understand the background to the issues.  Most of that material is just not available online. This project is one way of facilitating access to these backfiles. 

JISC: You mentioned MMR, do you think there is something in this project for the public at large, or do you think it is too specific?

RK: Everything we digitise will be made freely available online – and I hope that the public will make use of this archive.  

JISC: Can you explain the digitisation process?

RK: We are taking the archives of journals, such as the Journal of Physiology and the Biochemical Journal, and are going to scan every single page. Once scanned, the page is subjected to optical character recognition indexing – thus facilitating full-text indexing of every word in the archive. 

For every discrete article, (such as a research paper, editorial, letter, etc.), we will also create an XML citation, which will be added to PubMed Medline.  As a consequence, anyone will be able to log onto Medline (the preferred search tool for health professionals) and find an article from the archive – even if the article dates from the 19th century. From the citation, the user will be able to link dynamically to the full text. 

JISC: Could you tell me a bit about your collaboration with the partners in the US and what motivated this?

RK: This is a joint project funded by JISC and The Wellcome Trust, working in collaboration with the US National Library of Medicine (NLM). The NLM are managing the digitisation process and will undertake the quality assurance processes on the archives, to make sure all pages are there and of suitable quality. The NLM are also responsible for hosting the archive – though in time, it may be possible to mirror this data to a European PubMed Central node.  

The NLM were the obvious partner as they were already digitising back files and had a product (PubMed Central) already online. One of the findings from the HEDS study, was that the successful digitisation projects tend to be those that have a critical mass of digital surrogates. Little “digital islands” of data do not get used.  

JISC: You mentioned quality assurance – can you tell me a bit more about this?

RK: When the contractor returns the scanned paper archive, the NLM run a series of automated checks. This will pick up obvious problems, such as missing pages, or pages out of order.  In addition to this however, the NLM also do a manual 10% sampling check – where the returned PDF is compared with the original published journal.  This is labour intensive, but it does help to ensure that the archive is of high quality – something all three partners are keen to see. 

JISC: Are the NLM taking responsibility for the physical archiving?

RK: Yes, but remember, there are two elements to this project - a paper one and a digital one.

With regard to the paper archive, we look to the publisher to provide this.  Because of the way journals are digitised – issues are de-spined – we ask the publisher for a disposable copy. Once the publisher has supplied the archive, the NLM produce an inventory and check for completeness. They also put together a style sheet to indicate how the journal should be scanned, and how the XML should be marked up. Once all this information has been prepared the archives are shipped for scanning. 

At the same as the paper archive is being prepared, publishers are asked to send sample digital files to the NLM for evaluation. The purpose of this is to ensure that digital files can be added to the archive – with little (or no) human intervention.  

JISC: What about technology moving forward very fast, what is the take on that?

RK: The whole concept behind PubMed Central is that it is a long-term archive. Indeed, PubMed Central's approach underlies the NLM's basic archiving philosophy. The xml is the digital archival copy of record.  By creating an online view directly from the xml, the NLM are ensuring that they have an accurate archival record - what you see is what the publisher has archived.  

JISC: Can we move onto the open access issues?

RK: The Wellcome trust has published two reports related to the Open Access debate. Our view is that the current publisher model does not work in the interest of researchers, libraries, or the public. To try to remedy this, Trust-funded researchers are encouraged to publish in open access journals.  Additional funding is made available to cover the author costs associated with this new business model. 

Hopefully, over the next few months, the Wellcome will further develop its OA policy. Encouraging researchers to move to the OA model is one thing, but I suspect that we need to be more pro-active. We are the UK's biggest funder of medical research, spending over £400 million per year. With this level of spend we can help to influence change.  

It is interesting to note that the National Institutes of Health have drafted a consultation paper, which, if implemented, would require NIH grantees to deposit their research papers in PubMed Central. Such papers would then be freely available, within six months.  

The Trust recognises that dissemination of research is part of our mission. The results of the Human Genome project, (the Trust was a major funder) are made publicly available, via the Internet. We recognise that more and better research would result by making the outcomes of the Genome project freely available.  This approach now needs to be applied to research papers. 

JISC: What benefits are there for the Wellcome Trust to be working with JISC?

RK: In terms of the Medical Journals Project, JISC has been an instrumental player in bringing this project into reality.  JISC committed its funding contribution early on in the project negotiations – and this support was used to lever additional funding from the Wellcome. 

More generally, JISC’s positive stance on OA means that we share the same philosophy, in terms of making research freely available to all. 

JISC: What key landmarks will there be in the project?

RK: To date we have secured the agreement of about seven or eight titles, from a mixture of publishers. A number of these have already been shipped for scanning – and I anticipate that a couple of titles (probably the Biochemical Journal and Medical History) will be available online by Spring 2005.  The other titles will made available over the following 2-years. 

JISC: Where can I find more information about the project?

RK: There is, of course, a website for further information Wellcome Library: Medical journals backfiles digitisation project where you can get also get an up-to-date list of journals that have agreed to participate in this project.