Over the last decade or so, travellers moving towards an open future for scholarly communications have picked up their pace, spurred on by shifts in public perceptions of openness coand by evolving scholarly views and practices.
National and international initiatives from governments and funders have provided fresh incentives so that, in the summer of 2014, the time was right for a review of achievements so far and an exploration of emerging issues and points of interest.
Report foreword by Clifford Lynch, executive director CNI
At Jisc and CNI we believe that international collaboration is of immense importance. Openness with regard to research cannot be merely a national issue.
For around 15 years, our organisations have fostered collaborative exchanges and activities on issues related to the opening up of scholarly communications because we believe this is a task we can do better together. We are learning from each other’s successes and setbacks and developing approaches that will work not just in our own back yards but also for the global scholarly community.
Over two days in July 2014 Jisc and CNI discussed several aspects of openness. We explored the growing diversity and increasing democratisation of scholarly discourse, the transitional arrangements that are supporting experiments in open access and the growing need to make data, methods and software open. We examined the importance of incentivising researchers to open up their work to scrutiny and collaboration.
This report shares the key messages from our conference, reviews the current open landscape and explores emerging themes surrounding policy, data, access and infrastructure. Importantly, the report also describes some innovative ideas and upcoming developments that will take scholarship further along the road towards being open.
The previous Jisc and CNI conference on opening up scholarly communication was in 2012. Since then much has changed, especially in terms of national policies to support open access to publications and research data.
In the scholarly context openness is now embedded and it is remarkable how much scholarly information is accessible today, compared with ten or 15 years ago. Following the scholarly lead, openness has now gained traction in wider society so that many governments, businesses and non-profit organisations are striving to be open.
Major policy initiatives in the UK and US have helped to drive developments
In the UK in 2012, the government-sponsored Finch Inquiry reported on strategies to improve access to internationally peer-reviewed research. In response, the UK has implemented a gold open access policy and research grants are made available to support one-off article processing charges (APCs), ensuring that papers are open to all rather than being accessible only via subscription fees. And in the US, the ramifications are still being felt from the White House Office of Science and Technology (OSTP) Memorandum, February 2013.
It requires all federal agencies to ‘make their research results free to read within 12 months after publication’, and this includes research data. At the time of the conference in summer 2014 most agencies were behind schedule in meeting this requirement but most US federal agencies implemented their policies in response to the OSTP memorandum early in 2015.
Details vary from agency to agency and the policies are still evolving but they apply to journal articles and conference proceedings and are likely to extend to books and electronic theses/dissertations in future. As mentioned above, the US mandate requires data to be made accessible and this is a major shift towards opening up research outputs.
In both countries, funder-driven mandates are critical. In the UK, Research Councils UK (RCUK) has adopted, in modified form, the Finch recommendations, favouring the gold open access route over the green. In the US, the OSTP-driven process is moving ahead. At the same time, independent funders such as the Wellcome Trust, the Andrew W Mellon Foundation and the Howard Hughes Medical Foundation continue to be influential.
The rise in institutional OA mandates that reflect the expansion of funder mandates has been a further important driver of change.
Research Excellence Framework (REF)
A critical development in the UK has been the funding councils’ policy that, from 2016, any piece of work submitted to the Research Excellence Framework (REF) – the key mechanism for assessing the research activities of British higher education institutions (HEIs) - must be deposited in a repository at the point of acceptance for publication.
Each university’s REF score will have a profound effect on its share of the funding pot so institutional managers will need to make sure that their researchers comply. This is a very effective way to push open access right to the top of the ‘to do’ list. While there's nothing like the REF currently in the US, it is clear that the funder mandates for openness will have ‘teeth’: failure to comply will result in the withholding of grant payments or disqualification from consideration for future grants.
Beyond the scholarly article
All these open access developments focus primarily on the journal and the research article. This has been one of the earliest and most intensely explored areas of scholarly communication but one of the most exciting current developments is the move to a broader view. It is important that dissertations and monographs can also be open and new models are being explored. Moreover, work is being done to ensure workflows, code, software, video and other components in a research project can all be made accessible. How should these evolve and interact in a digital world to support ever more open and effective scholarship?
This is a crucial question. It is true that some have already been well studied and may have been the focus of policies and programmes to encourage openness for a long time: electronic theses and dissertations are being produced in growing numbers and the role of open source software in supporting scholarship has received considerable attention. But there is much more work needed in these areas and speakers at the conference explored approaches in more detail.
Open Educational Resources (OERs), while not a primary focus of our discussions at the meeting, are also gaining traction; this is a trend worth tracking, and perhaps would form a useful future focus for joint discussions.
In the last few years, there has been a huge policy drive towards the sharing and management of research data.
‘We are now on the brink of an achievable aim: for all science literature to be online, for all of the data to be online and for the two to be interoperable,’ reported the Royal Society in 2012 in its influential report, Science as an Open Enterprise.
Funders such as the National Institutes of Health (NIH) in the US and RCUK and the Wellcome Trust in the UK have taken a lead with policies on opening up research data and there seems to be a political will. A statement from the G8 science ministers in June 2013 asserts a commitment to making publicly funded scientific research data open to the greatest extent and with the fewest constraints possible although some legitimate, well-accepted legal, ethical and/or commercial restrictions will still sometimes be unavoidable. What these are, and how best to manage them, are subjects for serious consideration.
Read on now, for a discussion of this and other key themes that emerged during the conference.
Themes: diverse discourse and the rise of social machines
Scholarly discourse is expanding. It involves more of everything - people, topics, institutions - and it is now only rarely about one author and a single paper. Global library cooperative OCLC has been working through the implications of these developments for research and its report, 'The Evolving Scholarly Record', is a useful resource. Sometimes, data is not neatly linked and identified to a specific research project. Scholarly discourse is much more social and sometimes messier as a result.
But it is also often better. In science, the citizen science astronomy projects on Galaxy Zoo and the Zooniverse platform, and the Cornell University e-bird project all demonstrate the appeal and power of science where data is collected and shared with the wider community. In the social sciences and humanities, the Digging into Data big data challenges show the range of discoveries that become possible when scholars, scientists and information professionals from Europe and North America come together to collaborate.
Trees and tweets, for example, is a joint venture between Aston University in the UK and the University of South Carolina in the US, analysing huge datasets to map variations in dialect and migration patterns in a bid to understand how far the latter explain regional linguistic variations.
These are exciting developments with major potential to advance research, but they raise structural and legal issues:
- Who has ultimate responsibility for making stewardship decisions about the research?
- Who funds the implementation of these decisions?
- Which repository (or repositories) will hold the outputs?
- What happens in relation to different national governments’ laws?
There are also questions about how research outputs can be made more open for analysis, exploration and reuse.
Standard journal articles cannot do justice to the rigour and liveliness of some kinds of research. Some subjects require multimedia to capture their multiplicity and bring them to life, whether it is data comparing quilts and maps over four centuries or musical analysis. Who will publish that work and how will they do it? However, the change is greater than this: it goes beyond new forms of media and units of research discourse that need to be curated.
We are now witnessing social interactions that might be characterised as social machines. Tim Gower’s Massively Collaborative Mathematics is an example where we see new forms of knowledge creation and new ways in which research is done. These developments are starting to disrupt traditional research and they point to the need for new ways to create and credit contributions to research. He has said of crowd-sourcing research:
“It’s like driving a car, whilst normal research is like pushing it.”
In research, platforms such as MyExperiment turn scientific workflows into social objects. PDFs, PowerPoints and other pieces of work can be attached to workflows to gather all the elements together and make them shareable as social objects that can be re-run, repurposed and used in fresh, interesting ways.
Conference speaker Dave De Roure summarised his mapping of developments in scholarship and its communication with this diagram.
Themes: hybrid journals and APCs: a dysfunctional market?
Hybrid journals - publications that allow individual papers to be made open access on payment of an article processing charge (APC) – have been a key feature of this ‘transition’ period of open access.
In the UK the Wellcome Trust spent around £3.9m on APCs for 2,126 articles in 2012-13 at an average cost of around £1,821 per article. Nearly three-quarters of Wellcome-funded research articles in 2012-13 were published in hybrid open access journals; 82% of its expenditure on journal articles is with hybrid publishers.
But the market in hybrid journals is dysfunctional. Indeed, the average APC in a hybrid journal is twice that in a born-digital, fully open access journal. Moreover, there is the problem of the ‘total cost of ownership’ (TCO) for the institution in respect of journals – APCs are one element of the total cost that also includes associated administrative costs and journal subscriptions.
If this phase is a transitional one it shows few signs yet of progressing into something else - something cheaper and more transparent. Very few journals have evolved from subscription through hybrid to fully OA. The hope remains that market forces will drive down prices but it is taking a long time to happen.
The Bjork/Solomon study describes three possible scenarios that could offer solutions.
The first tackles the problem of the total cost of ownership. In this future, funders would only reimburse APCs for hybrid journals that had systems in place to ensure that institutions paying APCs receive equivalent reductions on their subscription payments.
In the second, funders could adopt a ‘value-based pricing’ model in which they would set tiered caps for their maximum contribution towards an APC based on the quality of the services provided by the journal to its authors. A funder might be prepared to pay more to a journal with huge reach than it would to one that doesn’t, for example. So far, it is not clear how publishers’ quality of service might be measured and evaluated.
In the third scenario, funders would pay a set proportion of an APC if the price exceeded a certain threshold; authors (or their institutions) would have to cover the shortfall. This would certainly introduce price awareness among researchers but it would also be much more bureaucratic. And where would the author find the money?
A fourth ‘magic bullet’ was raised in discussions at the conference. If funders were to follow the lead of Germany’s research body and simply refuse to support hybrids, this would force change in the market. But for several leading UK funders, notably Wellcome, it does not appear to be a viable option because the vast majority of their papers are published in hybrids; authors want to publish in the journal of their choice and such a decision would place the majority of titles out of bounds.
It seems preferable to pursue a range of measures, possibly mixing all three Bjork/Solomon solutions, while at the same time working collaboratively and remaining willing to make some tough decisions.
Focus firmly on the APC
It was pointed out that moving beyond hybrids won’t necessarily resolve the issues of the dysfunctional market. APCs are the central difficulty and as things currently stand journals could flip from hybrid to full open access and still charge the same. The real goal is to be able to have transparent costings where the community feels there is a fair price to be paid.
A final central point is that scholarly journals often have an international focus: they publish contributions from many nations and research funded by a multiplicity of funders. Business interests, traditional funding and access models are going to be disrupted by national policies on scholarly publishing.
Creating desirable economic restructuring will require coordination between nations and across funding bodies (including both public and private funders). It is necessary to reach a better understanding of how the currently uncoordinated national policies on open access support should be aligned to bring about a reshaping of international marketplaces.
Now, while US funder policy (particularly governmental policy) on APCs remains somewhat unclear, is a critical time to begin this conversation.
Themes: the economic future of the monograph and the possibilities for OA
The scholarly monograph continues to have huge importance in the humanities: the Jisc researcher survey 2014 found that 95% of humanities researchers thought it ‘important’ or ‘very important’ to publish research outputs as a monograph. Scholarly monographs also play important roles in other disciplines, such as mathematics. Yet the monograph has sometimes been neglected in discussions about open access.
This is changing.
There are two key issues. One is that the economic aspects of monograph publishing are deteriorating, both for traditional print and for open access versions. The second is that, despite a lengthy set of experiments, there is little consensus about how (or even if) monographs should evolve away from simply emulating print pages in the digital world and genuinely use the new capabilities of the networked information environment. Initiatives are now focusing on both of these main issues.
A key initiative under development by the Andrew W. Mellon Foundation is looking to explore and support the digital monograph. Mellon would provide seed funds to universities and colleges to fund their publications. Institutions would select authors to participate (who could choose to decline and pursue traditional forms and business models of publication) and, for a negotiated price paid by the author’s institution, willing presses would agree to produce a well-designed digital publication.
The presses would agree to deposit the publication in at least one trusted preservation repository with full metadata and make it available online under a Creative Commons (CC) licence. They could also sell derivative works to other markets. Mellon is modelling and consulting on this proposal at the moment.
There is a closely related initiative in the US that is being developed by the Association of Research Libraries (ARL) in cooperation with the Association of American Universities (AAU) and the Association of American University Presses (AAUP). At this point it remains unclear if the efforts will ultimately be merged or operate in parallel – if, indeed, they go forward at all.
In the UK, the transition to open access monographs is being explored by a HEFCE panel of experts and the OAPEN-UK project, which is building up an evidence base on the opportunities and challenges of open access monographs.
As with journals, scholarly monograph publishing is an international marketplace. As new initiatives emerge, this seems an ideal time to coordinate efforts that share broad common goals to create sustainable, open markets supporting disciplinary needs for scholarly communication in the digital world.
Themes: the rise of new big data sources as evidence to support research
In the social sciences, open scholarship is embracing big data and exploring new ways to use large datasets collected by diverse organisations, particularly social media companies. This includes interactional, attitudinal and behavioural data. The development brings huge potential to ask new questions and answer them in real time.
So it is encouraging that there is increasing interest in determining exactly what needs to be shared – data, methods, software – to enable intelligent openness in scholarship. More journals now require the sharing of data as a condition of publication, and the recent Public Library of Science (PLOS) policy requires authors to indicate where the dataset is stored at the time of submission, making it easier for reviewers, editors and readers to access that information reliably when they read the article.
But greater openness brings new challenges as well. One particularly challenging area is privacy and ethics. There was a media storm in early 2014 when it was discovered that Facebook had enabled research on its users without their knowledge. According to the resulting paper’s abstract, it shows, ‘via a massive (N = 689,003) experiment on Facebook, that emotional states can be transferred to others via emotional contagion, leading people to experience the same emotions without their awareness’.
The ethical and regulatory issues are very complex and not fully resolved but the individual’s rights to privacy and confidentiality are the focus of much concern. It is clear that the space between what is possible and what is ethically and legally allowed is widening rapidly. In response, the balance of power may well shift further and further in the individual’s favour. Bodies collecting data will need to consult, explain, and gain informed consent before gathering and using data, and also to facilitate access to the data that they hold if subjects request it. It would be wise also to have a robust, practical plan in place to rectify errors and breaches if these occur.
Nonetheless, social media is opening up new social processes created by citizens in real time and offering social scientists opportunities to do research that would not previously have been possible. Twitter is proving to be an exciting new source of data, particularly for retrospective analysis. A multidisciplinary team, funded by Jisc and working with the Guardian newspaper’s Reading the Riots investigation, analysed 2.4m tweets from the time of the 2011 London riots to discover how the network was used during the disturbance – and afterwards to organise the clean-up operation.
Themes: the management of research outputs: supporting reproducibility, reuse and repurposing
As the Facebook example in the previous section shows, complex privacy and sharing challenges surround the use of commercial data by researchers. But, as research expands, is there a more general crisis of reproducibility?
Of course, the data that supports a thesis must be accessible to ensure the credibility of any scientific enterprise, yet it is not always available. An investigation by UK scientific journal Nature into a specific example of cancer research found that 89% of cases could not be replicated – the data or the metadata were either not present or inadequate, making it impossible to scrutinise the evidence in a meaningful way.
In the social media arena, yet more potential perils of working with new big datasets have emerged. Several years ago Google ran a project mapping influenza outbreaks by looking for clusters of queries about flu systems. At that time the study was received with interest and repeated but, after a few years, the project went unexpectedly off track with its forecast. The way people described flu systems had apparently changed so that Google’s algorithm no longer correctly identified flu outbreaks. The company will not disclose the search terms it uses to track flu so researchers can’t collaborate and verify what has happened.
The differences between different kinds of datasets must be acknowledged and planned for – huge physics datasets and smaller plant biology datasets have very different requirements. And the difficulties some researchers face around reuse and repurpose was memorably summed up at the conference in the phrase “I’d rather use someone else’s toothbrush than their taxonomy…” We may need to develop more support to enable access, sharing and repurposing.
Increasingly, in the age of the cloud, storage of datasets is not necessarily an issue in terms of either cost or availability. The big cost is accession – the price of ensuring that data is documented in such a way that it is discoverable and useful. If not, it is effectively digital landfill.
In a Jisc survey on open data sharing over half the respondents said that ‘time to document and clean up’ data was a barrier to open data sharing - so should researchers employ others to help with data housework? Perhaps, if looking after data and making it accessible is a fundamental part of science, using some resources to do it properly ought to be seen as a legitimate use of a research grant. Inevitably, this results in less research funding but, as processes become easier and more embedded, good data management will become a routine element in the researcher’s workflows.
The issue of compliance with funder policies and what that means in practice continues to be of concern. Institutions recognise that they need to understand how to comply with funders’ policies and in particular:
- How to select data
- How to determine what data is of long-term value
- The implications in terms of storage and curation – not just in order to comply with the letter of funder policies but to support long-term access and reuse. Work is needed to understand and implement systems that will support reproducibility
Inevitably, disciplinary differences mean that, as yet, there are no definitive answers.
New processes and infrastructure are required to enable researchers to make the full range of their research outputs available to others in digital form.
One area that has been widely researched is the cost of digital curation. Although various cost models for preserving and curating data have been developed over the last ten years no open generic model has gained traction across the various stakeholder communities.
Themes: sharing and sustaining software
Sharing data openly is of limited use without the software to make sense of it. Software sharing and sustainability emerged as an important topic at the conference, running through many of the discussions.
It is becoming accepted that software and data are both integral parts of any reproducibility strategy.
Journal publishers increasingly recognise software sharing as a necessary next step. Take, for example, the Journal of the American Statistical Association (ASA): in 1996 no code was made available but in 2011 the figure was 21%.
There have been rapid advances in virtualisation and emulation technologies, which are rapidly opening up new approaches to sustaining software. However, much remains to be done. Software is still too often taken for granted and the issue of software sustainability has received significantly less attention than it deserves.
Support for software as a functioning part of the research infrastructure has too often been side-tracked into discussions about software preservation or completely derailed into debates about video games as cultural heritage. But these are different kinds of problems to the increasingly urgent need to be able to rerun experiments and make sense of their underlying data, which requires a set of software services that complements data and scholarly output repositories.
We are just beginning to understand the complex connections between sustainability and preservation and the ways in which the research community and commercial players, and open and community source approaches, interact in this area. In terms of sustainability and preservation there is often a conflation between maintenance and reuse, reproducibility and verification.
Data and software have common problems and the similarities between them are becoming clearer. Although data is frequently treated as a static asset we are starting to see the ways in which data is actually more like software.
Themes: rewards, incentives and credit
Recognition and reward mechanisms – for sharing, for creating new kinds of research output and for engaging in innovative, non text-based digital scholarship – continue to be a source of intense frustration and debate. Researchers, whose work still tends to be evaluated based on journal impact factors, naturally fear damage to the established models in which reward is so closely bound up with publishing in elite journals.
Are we seeing any systematic move towards changing those reward mechanisms to ones that will instead incentivise and reward innovative research practice?
The San Francisco Declaration on Research Assessment (DORA) has garnered a significant amount of interest; it has more than 12,000 individual signatories and 500+ allied organisations. It urges evaluation of researchers based on the quality of the work they do and not where they publish it, with its key recommendation that funding agencies and academic institutions ‘do not use journal-based metrics, such as journal impact factors, as a surrogate measure of the quality of individual research articles, to assess an individual scientist’s contributions, or in hiring, promotion, or funding decisions.’
Early career researchers in competitive environments need strong incentives to give up their data. Now, we are seeing grant funders starting to show interest in data depositions as important contributions to scholarship, along with activities such as development of scientific software; increasingly, these can legitimately appear on a cv sent to a funder. If data citation can be established as a marker of performance, that represents a win-win situation, as more data will be made open to advance researchers’ career progression.
Certainly, lack of attribution for sharing data is an inhibiting factor in developing open data. More metrics around the usage of data was discussed as a possible solution, including better use of data citation indices, identifiers and standards. But ultimately the DORA declaration is very wise.
Over time, the greatest incentives will result from a culture change in which sharing and pooling data is recognised as an important contribution to the advancement of scholarship and the scholarly community considers data sharing part of a holistic, informed and thoughtful assessment of an academic’s contributions, particularly in high stakes settings such as tenure and promotion reviews.
Themes: opening up cultural heritage
How are our cultural heritage organisations opening up? This topic was recognised as an emerging theme, opening a bridge between our cultural heritage and stewardship institutions and the core of research and education enterprise. The line between opening up scholarship and opening up access to cultural heritage that represents essential evidence for scholarship is increasingly blurred with some large museums, such as the Getty and Metropolitan Museum in New York, among others, taking very open policies.
Similar developments are taking place in university museums and archives and in special collections at research libraries. A number of academic libraries such as those at Cornell University, have instituted a policy whereby individuals can use material from their special collections (such as digitised images) which are either in the public domain or where Cornell holds rights (with attribution) without fees. It’s an area to watch and one where we are going to see much more focus as the move towards opening up research and scholarly communication continues.
The Digital Public Library of America and Europeana – digital platforms used by heritage and cultural organisations to bring digitised resources together – each have explicit policies on openness, accessibility and transparency. They are each creating what the DPLA describes as ‘an open, distributed network of comprehensive online resources… to educate, inform and empower everyone in current and future generations’ and they’re working actively on projects that further that goal.
These include initiatives to support educational use of resources and development of e-books, as well as advocating for reform and international standardisation of copyright law and usage rights. This last is a key issue.
National libraries also play an important role. The US Library of Congress has been digitising Americana collections for the past two decades. The National Library of Wales has had open access policies for many years and has digitised a critical mass of Welsh-related heritage material, much of it with Jisc’s support.
The British Library (BL) and its Sound Archive have also made a lot of their collections openly available and the BL has reported a substantial jump in interest and visitor numbers that can be directly attributed to the decision to release a million images from 17th-19th century print material.
The initiative that has probably been referenced most in the last few years is the Rijksmuseum’s bold step to make more than 200,000 high resolution images available for free over the web with no restrictions on use. This initiative replaced an image supply service that was limited in its capability, expensive to administer and provided only very modest returns.
The new open policy has surprised the museum staff by prompting large numbers of users to engage in fresh ways with the collection. University libraries have also done some excellent work, and research funder the Wellcome Trust has developed extensive digital collections including 15m pages of 19th century medical books that are to be made available openly via the Internet Archive, the Wellcome Digital Library and Jisc Historical Texts platform.
Continuing the journey
Jisc and CNI are committed to playing our part in planning next steps in the journey towards openness.
Through our ongoing collaboration we are coming to a more thorough understanding of the environment and the issues (for example market dynamics and the interaction of various national and sectoral funder policies), and we are working on strategies to change the environment through the development and deployment of new policies, technologies, practices, and business models.
The priorities are:
- Predictive modelling around hybrid journals
Develop models to explore how national and other funder policies affect the international scholarly communications ecosystem and particularly its economics
- SHARE (Shared Access Research Ecosystem)
Take coordinating actions between national inventory and notification systems such as SHARE in the US, Jisc Monitor and Publications Router in the UK, and OpenAIRE in Europe. The aim of these actions would be both to identify how such initiatives can collaborate operationally to meet the needs of research and to inform a more future-oriented view of an event-based notification infrastructure across a wide range of research activities and outputs
- Identity management
It is clear that large categories of research data (particularly involving human subjects) cannot be shared without some constraints. A fundamental infrastructural requirement for controlled or constrained data sharing is a solid agreement on technical, legal and policy frameworks for managing, trusting and authenticating the identity of researchers who want to make use of such data. Continued broad-based international conversations are necessary to resolve the issues in these areas
- Develop data, evidence and stories to help convince researchers to change their behaviour
- Explore approaches to licensing, technical standards and reuse that will support sharing
- Use international forums to envision and define what the future of digital scholarship is, so we can see the future possibilities and work towards these
It is important to note that all of these areas, and indeed others discussed at the conference, are very dynamic, with developments happening quickly. We have included a short section of notes on post-conference developments and reflections, but even these have been partially overtaken by events. There will be much to talk about as we continue the journey.