
Mark Walport speaks. Photo courtesy of Torsten Reimer
“It’s a complete no-brainer,” said Sir Mark Walport. The director of the Wellcome Trust was responding to JISC’s Digital Infrastructure Directions report into the value and benefits of text and data mining, which recommends that the UK should create a copyright exception for text and data mining for non-commercial research. “It is critical that we enable researchers to maximise the value of publicly funded published outputs. We need to just get on and do it,” he urged.
It was a view endorsed by his fellow experts on the panel, and the majority of audience, who had heard one of the report’s authors explain the rationale behind the study and the key findings within it at an event last night at the Wellcome Trust.
Dr Diane McDonald explained that JISC had commissioned the report because of the need for empirical evidence on the subject – the UK government has stated that policy changes should be based on solid evidence – and that she and co-author Ursula Kelly had used the UK Treasury’s own best practice guidelines to evaluate the research.
The context is that the academic world faces a data deluge. There are 1.5m academic publications every year and two new articles are uploaded to UK PubMed Central every minute of the day. No human researcher could hope to be able to examine the torrent of data in their field, make sense of it and turn it into new knowledge – but computers can. However, while there are some pockets of data mining within UK higher education, concentrated within the biomedical sciences, the entry and transaction costs to this new form of research are, in the main, so high as to be off-putting.

Panel session. Photo courtesy of Torsten Reimer
The availability of material for mining is limited – most text mining in the UK is based on open access publications – and researchers face legal uncertainty as they negotiate a maze of licensing agreements. In addition, there are inaccessible information silos where different corpora of articles come in different formats with different standards and different metadata, making it extremely difficult to search across them. There is also low awareness among both researchers and publishers of the potential for text mining.
Yet, the benefits of lowering the barriers to such forms of research could be significant, not only for UK higher education but also for its economy and for society as a whole.
Professor Martin Hall, vice-chancellor of the University of Salford, offered an example of how data-mined information could have a real impact on public health. For example, these tools could be used to create a cholesterol map of greater Manchester which would allow public health officials to focus efforts where it counts and make a significant intervention.
Professor Douglas Kell, chief executive of the BBSRC, meanwhile, pointed to research in his own field and the move towards a more inductive, data-driven model where the research begins with the data and finds a hypothesis that fits rather than vice versa. “Integrative biology requires the use and thus access to data and literature that one did not create itself. Without this, biological research will be stalled,” he said.
It was a point picked up by audience member Philip Ditchfield of GlaxoSmithKline. “There are about 7,000 diseases out there and we can cure about 1% as an industry at the moment. We’re all patients at the end of the day and we need to discover medicines. That’s the priority,” he commented. “We’re a very compliant industry and we want to work with publishers, not undermine their intellectual property. Publishers often say you can mine our content – you just have to ask us. That’s very easy to say and very hard to achieve. It is like in the early days of motor cars when you were allowed to drive down the road but you had to have a man with a red flag running in front of you.”
Removing this red flag, at least for non-commercial research, in the form of a copyright exception to support text mining and analytics, as proposed by Hargreaves is the key recommendation of the report.
Find out more about the report