We use cookies to give you the best experience and to help improve our website

Find out more about how we use cookies

Choose whether to use cookies:

No thanks That's fine

Skip to main content

Jisc

You are in:

  • Advice
  • Guides
  • The text and data mining copyright exception: benefits and implications for UK higher education

Utilities:

  • Search the Jisc website
    Clear search results

Search the Jisc website
Clear search results

Navigation:

Guide

The text and data mining copyright exception: benefits and implications for UK higher education

Helping you to understand the legal implications of the new text and data mining copyright exception

Archived
This content was archived in February 2018

About this guide

Authors

  • John Kelly

    Subject specialist: strategy (law)

  • Published: 9 February 2016
  • Updated: 9 February 2016

Contents

  • Introduction
  • What is text and data mining?
  • What is the TDM copyright exception?
  • What the exception allows you do
  • What isn't permitted
  • Practical implementation
  • Technical protection measures
  • Sharing outputs
  • Quotation exception
  • Fair dealing
  • Personal data
  • Supporting resources

Introduction

Changes in the law enable researchers to make copies of copyright material for computational analysis. This guide outlines the implications of the new text and data mining copyright exception1 for researchers, research support services and librarians in UK universities.

While this guidance provides an overview of the key issues associated with text and data mining at the time, it should not be construed as legal advice.

If you require legal advice, you should seek the services of a suitably qualified and experienced legal professional.

What is text and data mining?

Text and data mining (TDM) is defined by the UK Intellectual Property Office2 (IPO) as:

“The use of automated analytical techniques to analyse text and data for patterns, trends and other useful information”

Text and data mining usually requires copying works for analysis.

Challenges

As outlined in our Value and benefits of text mining report in 2012, an estimated 1.5 million new scholarly articles are published per annum. The vast quantities of scholarly communications within a given field or discipline means that a single researcher or research group is unable to keep up-to-date with the research that informs their work or leads to new discoveries.

Additionally, while keyword searches are useful, they don’t deliver the intended semantic nuance in the search term and therefore may retrieve unrelated documents with identical terminology.

For example, tree, branch and leaf have very different meanings in ecology and informatics, something that is easy for a researcher to see but not a computer.3

Solution

TDM offers a solution to these challenges by drawing on computational techniques that allow for mining and analysing infinite bodies of any born digital content (data, text, diagrams, tables and images). This supports efficient and semantically-driven information retrieval.

Changes in the law

Before the introduction of the TDM copyright exception, lawful copying of material for text and data mining analysis was only facilitated by material specific licence agreements with varying licensing terms. This led to both a lack of clarity, as well as limiting researchers to certain types of copyright material, such as published works.

Copyright law has however now altered so that if a researcher carries out TDM for non-commercial research purposes, it will no longer be viewed as an infringement. Further changes in the law mean that limited copying of all types of copyright works can be undertaken for non-commercial research.

While the issue of economic-related barriers and high transaction costs which the Value and benefits of text mining report identified still remains, the TDM exception introduced by UK government in 2014 has gone a long way to address the legal restrictions.

What is the TDM copyright exception?

On 1 June 2014, the UK government introduced a number of reforms to copyright (including the TDM exception) that were intended to “give a number of sectors a legal framework fit for the digital age, removing the burden of unnecessary regulations and helping the UK better preserve and use copyright material”.4

The UK IPO describes the TDM exception (section 29A of the Copyright, Designs and Patents Act 1988 (CDPA)):

“An exception to copyright exists which allows researchers to make copies of any copyright material for the purpose of computational analysis if they already have the right to read the work (that is, they have “lawful access” to the work). This exception only permits the making of copies for the purpose of text and data mining for non-commercial research.

"Researchers will still have to buy subscriptions to access material; this could be from many sources including academic publishers.5”

What the exception allows you to do

Ability to mine all types of content/data

The exception permits any published and unpublished in-copyright works to be copied for the purpose of text mining for non-commercial research. This includes sound, film/video, artistic works, tables and databases, as well as data and text, as long as the researcher has lawful access.

Lawful access to commercial journals/data-sets

The exception allows "computational analysis" of content provided the research is non-commercial and lawful access, such as via licence, is in place. This means that where a researcher engaging in TDM can satisfy the exception, he/she does not need additional permission. This applies to any datasets, journal articles as well as to unpublished copyright works.

Researchers can mine across a range of sources: materials from different academic publishers, as well as appropriately licensed data/ content on their institutional repositories or other aggregation services such as COnnecting REpositories (CORE) for non-commercial research purposes.

Example

A linguist examining and analysing medical textbooks, and investigating the kind of vocabulary appearing in them, needs to copy and scan whole works for patterns using automated processes.

Before the 2014 changes the linguist would need permission from each publisher for works where copyright hadn’t expired - a very time consuming process.

Now, as long as the linguist has access to the work and provides sufficient acknowledgement, they are entitled to copy it for text and data analysis without clearing it with each rights owner in advance.

Creative commons

Data and content licensed under the Creative Commons (CC) licence CC-BY can be mined for commercial research or any purpose without the reliance on the TDM exception. However, given that repositories may include material licensed under different conditions (not only CC-BY), researchers should check and observe the licence restrictions especially if they're planning to access and share content for commercial research.

Notably Creative Commons licences with a Non-Commercial (NC) restriction can’t be used for any commercial purposes.

No contractual override

The exception contains a provision that effectively means any contractual terms which prevent or restrict the making of a copy for the purposes of TDM, are unenforceable.

What isn't permitted

Commercial purpose copies

The exception is very explicit that the TDM should be for non-commercial purposes. Copies made for the purpose of TDM can’t be sold or made publicly available in any way and anybody doing so without permission from the publisher could be sued for copyright infringement.

Copies made for the purpose of commercial TDM must have the permission of the rights holder

Unlawful access

Researchers may only perform TDM on data/content that they have lawful access to ie, is licensed/ subscribed to the researcher personally or by the institution on their behalf. It‘s possible that such copies could lawfully be shared among a closed group of researchers, if the data/content was licensed/subscribed to by all researchers personally or by the institution on their behalf. Otherwise sharing would not be lawful.

Circumventing technical measures

The exception states that:

“Publishers and content providers are able to apply reasonable measures to maintain their network security or stability.6”

These are generally measures applied by the publisher that block downloads if they detect uncommon use and potentially suspend access prior to further investigation. Whilst the exception also stipulates that “these measures should not prevent or unreasonably restrict researcher’s ability to text and data mine”7, no action should be taken by researchers to circumvent these measures.

Any person unlawfully circumventing a technical measure is at risk of civil action or possibly even criminal prosecution.

Sharing the outputs of TDM

If the derived copies or results/conclusions contain copyright material from the original, ie, they don’t just contain facts but also original material, it will still be possible in some circumstances to share the outputs. For example under another copyright exception, the new quotation exception, it is possible to share results with individuals who were not lawfully involved in the original computational analysis.

This “fair dealing” exception allows limited quotation of copyright works as long as the length of the quote used does not undermine the legitimate (often commercial) interests of the rights holders. Please see the section on 'sharing outputs.

Practical implementation and recommendations

So how does the exception operate in practice? Collaborative research projects

Within the context of research projects involving groups of people across institutions, sharing access to a lawfully mined copy is likely to be acceptable as long as each member of the group has lawful access to original content being mined.

Recommendation

Any TDM undertaken by research groups should ensure that all individuals have lawful access to the original work either through their own institution or via registration at the institution where the mining takes place.

For international research projects

To what extent can non-UK based colleagues who are part of a UK research project rely on the TDM copyright exception?

Researchers who are based abroad, and not affiliated to a UK institution would need to refer to the copyright law of their own jurisdiction for the equivalent exception.

Where the researcher is based abroad, is registered with, and has lawful access to the licensed content via the UK institution, then making copies for computational analysis under the exception should only be done in the UK by UK based colleagues.

Recommendation

The TDM exception in UK law permits the act of copying copyright material. In order for the individual researcher to be covered by the TDM exception, the act of copying would need to take place in the UK.

Acknowledgement

How far do the original sources require acknowledgement?

The TDM exception requires sufficient acknowledgement of the copied works, unless an acknowledgement is impractical. This is particularly important as often TDM incorporates the works of many hundreds, if not thousands of contributors, and in some cases, acknowledgement of all sources may well be impractical.

Recommendation

If an individual uses defined databases or data-sets, the researcher should make reference to these to point to where the works were obtained.

Technical protection measures

These can be used by rights holders to control access to their works and to maintain security or stability, as long as these measures don’t prevent or unreasonably restrict a researcher’s ability to make copies needed for their TDM.

The law also states that any person unlawfully circumventing a technical measure is at risk of civil action or possibly even criminal prosecution. Many rights holders have technical measures installed that block unusual web traffic, and/or require a dedicated application programming interface (API) to be used for TDM.

However, what “purports to prevent or unreasonably restrict” actually means “remains undefined” by case law.

Recommendation

Researchers who need to access and process content which is restricted by technical protection measures should negotiate initially with the publisher to authorise access. Circumventing technical protection measures should be avoided.

Benefitting from UK copyright exceptions

Where a researcher is stopped from being able to benefit from the TDM copyright exception due to the application of technical measures by a rights holder, they are assisted by section 296ZE of the Copyright, Designs and Patents Act 1988 (CDPA).

This section provides a process where a letter of complaint can be addressed to the Secretary of State who will then investigate the complaint.

Sharing outputs

The ability to share outputs is dependent upon to what extent there is copyright or database rights in the derived materials to being shared. Database rights can arise where data is arranged in a specific way.

Copyright covers original work that has been created as a result of an intellectual process. There is no copyright in a fact or a collection of facts unless some intellectual rigour has applied in the interpretation or presentation of those facts.

If there is no copyright or database right in the material being shared then there are no restrictions on it being shared. For example: a list of numbers reflecting probabilities against certain key terms, or a count of how often specific words appear in a film/song/text is highly unlikely to contain any copyright or database right from the original dataset.

In such instances the data can be shared with anyone, irrespective of whether they have access to the original work or what country they are based in.

Quotation exception

Where the derived copies or the results/conclusions don’t just contain facts but include original protected material, it’s possible to share with individuals who were not lawfully involved in the original computational analysis, under the new quotation exception.

The Quotation Exception (section 30 of the CDPA) allows the work of a third party to be quoted to illustrate a point being made, as long as:

  • The work has been made available to the public ie, it’s published
  • The quotation use doesn’t exceed what can be considered "fair dealing"8
  • The quotation amount is no more than is required for the specific purpose
  • The quotation is accompanied by a sufficient acknowledgement, where possible.

The quotation exception will be helpful to those who wish to share TDM results that contain some element of copyright from the original work eg, for publication purposes. It will also apply to sharing online. However, researchers not based in the UK would need to refer to the law applicable to them to ensure that any further dissemination/access was lawful.

Crucially, like the text and data mining exception, this exception contains a provision whereby any contractual terms which prevent or restrict copying under this exception, are unenforceable.

Fair dealing

We refer to “fair dealing” throughout this guide. The TDM exception (section 29a of the CDPA) is not limited by fair dealing. It’s only where research outputs are shared or published containing third party works that it will be necessary to take fair dealing into account.

As outlined in the guidance issued by the UK IPO9  there is no statutory definition of fair dealing - it will always be a matter of fact, degree and impression in each case. Relevant things identified by the courts include whether the amount of the work taken is reasonable and appropriate and whether the use of a work acts as a substitute for it, causing the owner to lose revenue.

These factors can help determine whether or not the use is fair. More information is also available from the IPO’s exception to copyright guidance.

Personal data

It should be noted that this guide deals with the copyright aspects of TDM. When sharing research data that can be considered personal data, it’s also necessary to address the protection of that personal data.

All institutions have specialists in data protection who can assist with the legal requirements when processing personal data in the first instance.

Supporting resources

  • Content Mine uses machines to liberate 100,000,000 facts from the scientific literature
  • COnnecting Repositories (CORE) facilitates free access to scholarly publications distributed across many systems. It gives access to millions of scholarly articles aggregated from many open access repositories
  • CrossRef text and data mining service provides a CrossRef metadata API for researchers to access the full text of content identified by CrossRef digital object identifiers (DOIs) across publisher sites regardless of their business model. Both components are free to use by researchers and the public.

Further information

  • Find out how LIBER, Europe’s largest network of research libraries is calling for legislative change to enable better text and data mining
  • Read our guide to understanding intellectual property rights which outlines what you can do to meet your legal requirements.

 

Footnotes

  • 1 Exceptions to copyright - https://www.gov.uk/government/uploads/system/uploads/attachment_data/fil...
  • 2 Exceptions to copyright - https://www.gov.uk/guidance/exceptions-to-copyright
  • 3 Value and benefits - https://www.jisc.ac.uk/reports/value-and-benefits-of-text-mining
  • 4 New exceptions to copyright reflect digital age - https://www.gov.uk/government/news/new-exceptions-to-copyright-reflect-d...
  • 5 Exceptions to copright: research - https://www.gov.uk/government/uploads/system/uploads/attachment_data/fil...
  • 6 Exceptions to copyright - https://www.gov.uk/government/uploads/system/uploads/attachment_data/fil...
  • 7 Exceptions to copyright: research - https://www.gov.uk/government/uploads/system/uploads/attachment_data/fil...
  • 8 Exceptions to copyright - https://www.gov.uk/guidance/exceptions-to-copyright
  • 9 Exceptions to copyright: research - https://www.gov.uk/government/uploads/system/uploads/attachment_data/fil...
Explore more on this topic
Legal

About the author

John Kelly

John Kelly

Subject specialist: strategy (law), Jisc

I help members to understand and navigate their legal obligations when deploying their technology-based strategies.

Phone
0203 819 8219
Twitter
@legalFEHE
Email
john.kelly@jisc.ac.uk

About this guide

Authors

  • John Kelly

    Subject specialist: strategy (law)

  • Published: 9 February 2016
  • Updated: 9 February 2016

You are in:

  • Advice
  • Guides
  • The text and data mining copyright exception: benefits and implications for UK higher education

Areas

  • Connectivity
  • Cyber security
  • Cloud
  • Data analytics
  • Libraries, learning resources and research
  • Student experience
  • Trust and identity
  • Advice and guidance

Explore

  • Guides
  • Training
  • Consultancy
  • Events
  • R&D

Useful

  • About
  • Membership
  • Get involved
  • News
  • Jobs

Get in touch

  • Contact us
  • Sign up to our newsletter
  • Twitter
  • Facebook
  • LinkedIn
  • YouTube
  • Cookies
  • Privacy
  • Modern slavery
  • Carbon reduction plan
  • Accessibility