Introduction
Changes in the law enable researchers to make copies of copyright material for computational analysis. This guide outlines the implications of the new text and data mining copyright exception1 for researchers, research support services and librarians in UK universities.
What is text and data mining?
Text and data mining (TDM) is defined by the UK Intellectual Property Office2 (IPO) as:
“The use of automated analytical techniques to analyse text and data for patterns, trends and other useful information”
Text and data mining usually requires copying works for analysis.
Challenges
As outlined in our Value and benefits of text mining report in 2012, an estimated 1.5 million new scholarly articles are published per annum. The vast quantities of scholarly communications within a given field or discipline means that a single researcher or research group is unable to keep up-to-date with the research that informs their work or leads to new discoveries.
Additionally, while keyword searches are useful, they don’t deliver the intended semantic nuance in the search term and therefore may retrieve unrelated documents with identical terminology.
For example, tree, branch and leaf have very different meanings in ecology and informatics, something that is easy for a researcher to see but not a computer.3
Solution
TDM offers a solution to these challenges by drawing on computational techniques that allow for mining and analysing infinite bodies of any born digital content (data, text, diagrams, tables and images). This supports efficient and semantically-driven information retrieval.
Changes in the law
Before the introduction of the TDM copyright exception, lawful copying of material for text and data mining analysis was only facilitated by material specific licence agreements with varying licensing terms. This led to both a lack of clarity, as well as limiting researchers to certain types of copyright material, such as published works.
Copyright law has however now altered so that if a researcher carries out TDM for non-commercial research purposes, it will no longer be viewed as an infringement. Further changes in the law mean that limited copying of all types of copyright works can be undertaken for non-commercial research.
While the issue of economic-related barriers and high transaction costs which the Value and benefits of text mining report identified still remains, the TDM exception introduced by UK government in 2014 has gone a long way to address the legal restrictions.
What is the TDM copyright exception?
On 1 June 2014, the UK government introduced a number of reforms to copyright (including the TDM exception) that were intended to “give a number of sectors a legal framework fit for the digital age, removing the burden of unnecessary regulations and helping the UK better preserve and use copyright material”.4
The UK IPO describes the TDM exception (section 29A of the Copyright, Designs and Patents Act 1988 (CDPA)):
“An exception to copyright exists which allows researchers to make copies of any copyright material for the purpose of computational analysis if they already have the right to read the work (that is, they have “lawful access” to the work). This exception only permits the making of copies for the purpose of text and data mining for non-commercial research.
"Researchers will still have to buy subscriptions to access material; this could be from many sources including academic publishers.5”
What the exception allows you to do
Ability to mine all types of content/data
The exception permits any published and unpublished in-copyright works to be copied for the purpose of text mining for non-commercial research. This includes sound, film/video, artistic works, tables and databases, as well as data and text, as long as the researcher has lawful access.
Lawful access to commercial journals/data-sets
The exception allows "computational analysis" of content provided the research is non-commercial and lawful access, such as via licence, is in place. This means that where a researcher engaging in TDM can satisfy the exception, he/she does not need additional permission. This applies to any datasets, journal articles as well as to unpublished copyright works.
Researchers can mine across a range of sources: materials from different academic publishers, as well as appropriately licensed data/ content on their institutional repositories or other aggregation services such as COnnecting REpositories (CORE) for non-commercial research purposes.
Example
A linguist examining and analysing medical textbooks, and investigating the kind of vocabulary appearing in them, needs to copy and scan whole works for patterns using automated processes.
Before the 2014 changes the linguist would need permission from each publisher for works where copyright hadn’t expired - a very time consuming process.
Now, as long as the linguist has access to the work and provides sufficient acknowledgement, they are entitled to copy it for text and data analysis without clearing it with each rights owner in advance.
Creative commons
Data and content licensed under the Creative Commons (CC) licence CC-BY can be mined for commercial research or any purpose without the reliance on the TDM exception. However, given that repositories may include material licensed under different conditions (not only CC-BY), researchers should check and observe the licence restrictions especially if they're planning to access and share content for commercial research.
Notably Creative Commons licences with a Non-Commercial (NC) restriction can’t be used for any commercial purposes.
No contractual override
The exception contains a provision that effectively means any contractual terms which prevent or restrict the making of a copy for the purposes of TDM, are unenforceable.
What isn't permitted
Commercial purpose copies
The exception is very explicit that the TDM should be for non-commercial purposes. Copies made for the purpose of TDM can’t be sold or made publicly available in any way and anybody doing so without permission from the publisher could be sued for copyright infringement.
Copies made for the purpose of commercial TDM must have the permission of the rights holder
Unlawful access
Researchers may only perform TDM on data/content that they have lawful access to ie, is licensed/ subscribed to the researcher personally or by the institution on their behalf. It‘s possible that such copies could lawfully be shared among a closed group of researchers, if the data/content was licensed/subscribed to by all researchers personally or by the institution on their behalf. Otherwise sharing would not be lawful.
Circumventing technical measures
The exception states that:
“Publishers and content providers are able to apply reasonable measures to maintain their network security or stability.6”
These are generally measures applied by the publisher that block downloads if they detect uncommon use and potentially suspend access prior to further investigation. Whilst the exception also stipulates that “these measures should not prevent or unreasonably restrict researcher’s ability to text and data mine”7, no action should be taken by researchers to circumvent these measures.
Any person unlawfully circumventing a technical measure is at risk of civil action or possibly even criminal prosecution.
Sharing the outputs of TDM
If the derived copies or results/conclusions contain copyright material from the original, ie, they don’t just contain facts but also original material, it will still be possible in some circumstances to share the outputs. For example under another copyright exception, the new quotation exception, it is possible to share results with individuals who were not lawfully involved in the original computational analysis.
This “fair dealing” exception allows limited quotation of copyright works as long as the length of the quote used does not undermine the legitimate (often commercial) interests of the rights holders. Please see the section on 'sharing outputs.
Practical implementation and recommendations
So how does the exception operate in practice? Collaborative research projects
Within the context of research projects involving groups of people across institutions, sharing access to a lawfully mined copy is likely to be acceptable as long as each member of the group has lawful access to original content being mined.
For international research projects
To what extent can non-UK based colleagues who are part of a UK research project rely on the TDM copyright exception?
Researchers who are based abroad, and not affiliated to a UK institution would need to refer to the copyright law of their own jurisdiction for the equivalent exception.
Where the researcher is based abroad, is registered with, and has lawful access to the licensed content via the UK institution, then making copies for computational analysis under the exception should only be done in the UK by UK based colleagues.
Acknowledgement
How far do the original sources require acknowledgement?
The TDM exception requires sufficient acknowledgement of the copied works, unless an acknowledgement is impractical. This is particularly important as often TDM incorporates the works of many hundreds, if not thousands of contributors, and in some cases, acknowledgement of all sources may well be impractical.
Technical protection measures
These can be used by rights holders to control access to their works and to maintain security or stability, as long as these measures don’t prevent or unreasonably restrict a researcher’s ability to make copies needed for their TDM.
The law also states that any person unlawfully circumventing a technical measure is at risk of civil action or possibly even criminal prosecution. Many rights holders have technical measures installed that block unusual web traffic, and/or require a dedicated application programming interface (API) to be used for TDM.
However, what “purports to prevent or unreasonably restrict” actually means “remains undefined” by case law.
Benefitting from UK copyright exceptions
Where a researcher is stopped from being able to benefit from the TDM copyright exception due to the application of technical measures by a rights holder, they are assisted by section 296ZE of the Copyright, Designs and Patents Act 1988 (CDPA).
This section provides a process where a letter of complaint can be addressed to the Secretary of State who will then investigate the complaint.
Sharing outputs
The ability to share outputs is dependent upon to what extent there is copyright or database rights in the derived materials to being shared. Database rights can arise where data is arranged in a specific way.
If there is no copyright or database right in the material being shared then there are no restrictions on it being shared. For example: a list of numbers reflecting probabilities against certain key terms, or a count of how often specific words appear in a film/song/text is highly unlikely to contain any copyright or database right from the original dataset.
In such instances the data can be shared with anyone, irrespective of whether they have access to the original work or what country they are based in.
Quotation exception
Where the derived copies or the results/conclusions don’t just contain facts but include original protected material, it’s possible to share with individuals who were not lawfully involved in the original computational analysis, under the new quotation exception.
The Quotation Exception (section 30 of the CDPA) allows the work of a third party to be quoted to illustrate a point being made, as long as:
- The work has been made available to the public ie, it’s published
- The quotation use doesn’t exceed what can be considered "fair dealing"8
- The quotation amount is no more than is required for the specific purpose
- The quotation is accompanied by a sufficient acknowledgement, where possible.
The quotation exception will be helpful to those who wish to share TDM results that contain some element of copyright from the original work eg, for publication purposes. It will also apply to sharing online. However, researchers not based in the UK would need to refer to the law applicable to them to ensure that any further dissemination/access was lawful.
Crucially, like the text and data mining exception, this exception contains a provision whereby any contractual terms which prevent or restrict copying under this exception, are unenforceable.
Fair dealing
We refer to “fair dealing” throughout this guide. The TDM exception (section 29a of the CDPA) is not limited by fair dealing. It’s only where research outputs are shared or published containing third party works that it will be necessary to take fair dealing into account.
As outlined in the guidance issued by the UK IPO9 there is no statutory definition of fair dealing - it will always be a matter of fact, degree and impression in each case. Relevant things identified by the courts include whether the amount of the work taken is reasonable and appropriate and whether the use of a work acts as a substitute for it, causing the owner to lose revenue.
These factors can help determine whether or not the use is fair. More information is also available from the IPO’s exception to copyright guidance.
Personal data
It should be noted that this guide deals with the copyright aspects of TDM. When sharing research data that can be considered personal data, it’s also necessary to address the protection of that personal data.
All institutions have specialists in data protection who can assist with the legal requirements when processing personal data in the first instance.
Supporting resources
- Content Mine uses machines to liberate 100,000,000 facts from the scientific literature
- COnnecting Repositories (CORE) facilitates free access to scholarly publications distributed across many systems. It gives access to millions of scholarly articles aggregated from many open access repositories
- CrossRef text and data mining service provides a CrossRef metadata API for researchers to access the full text of content identified by CrossRef digital object identifiers (DOIs) across publisher sites regardless of their business model. Both components are free to use by researchers and the public.
Further information
- Find out how LIBER, Europe’s largest network of research libraries is calling for legislative change to enable better text and data mining
- Read our guide to understanding intellectual property rights which outlines what you can do to meet your legal requirements.
Footnotes
- 1 Exceptions to copyright - https://www.gov.uk/government/uploads/system/uploads/attachment_data/fil...
- 2 Exceptions to copyright - https://www.gov.uk/guidance/exceptions-to-copyright
- 3 Value and benefits - https://www.jisc.ac.uk/reports/value-and-benefits-of-text-mining
- 4 New exceptions to copyright reflect digital age - https://www.gov.uk/government/news/new-exceptions-to-copyright-reflect-d...
- 5 Exceptions to copright: research - https://www.gov.uk/government/uploads/system/uploads/attachment_data/fil...
- 6 Exceptions to copyright - https://www.gov.uk/government/uploads/system/uploads/attachment_data/fil...
- 7 Exceptions to copyright: research - https://www.gov.uk/government/uploads/system/uploads/attachment_data/fil...
- 8 Exceptions to copyright - https://www.gov.uk/guidance/exceptions-to-copyright
- 9 Exceptions to copyright: research - https://www.gov.uk/government/uploads/system/uploads/attachment_data/fil...