Open science at web-scale: Optimising participation and predictive potential
Download the full report
This report has attempted to draw together and synthesise evidence and opinion associated with data-intensive open science from a wide range of sources. The potential impact of data-intensive open science on research practice and research outcomes, is both substantive and far-reaching. There are implications for funding organisations, for research and information communities and for higher education institutions.
The original specification for the work was highly selective in its choice of areas to study, and this Report addresses only three of these areas in any depth:
- open science including open notebook science : making methodologies, data and results available on the Internet, through transparent working practices
- citizen science including volunteer computing : where volunteers who may not have scientific training, perform or manage research-related tasks such as observation, measurement or computation
- predictive science : data-driven science which enables the forecasting, anticipation or prediction of specific outcomes.
Give us your thoughts about this report on
Write to reply Synthetic science (research which combines science and engineering methods to design and build novel biological entities), and Immersive science (used to describe research involving virtual and simulated worlds), are referenced, but require more detailed examination. Fuller definitions of the terms and areas examined in this study have been provided in Section 3. In addition, the Report addresses data informatics and the supporting role of libraries for these particular aspects of open science.
The work was undertaken through a mix of desk research, including analysis from the peer-reviewed literature, presentations, selective blogs, wiki content, social network discussion, and by consultation with a small group of leading thinkers and researchers. The Report was also informed by presentations and talks given by the author during 2009.
The report is positioned as a consultative document, which it is hoped will stimulate and contribute to community discussion in the UK, but also fuel the open science debate on the global stage. Whilst many questions have been asked here, they will require fuller articulation and investigation in other fora. The economic implications will require detailed analysis and the societal benefits should be reviewed and evaluated. The consultative questions are clearly indicated in boxes in the text and are reproduced in full in the Executive Summary.
Consultation Challenge 1: Scale, Complexity and Predictive Potential
Data-intensive science powered by contemporary computational hardware, software and research techniques, enables scientists to perform experiments and calculations at different orders of magnitude of scale and volume: research that was completed in a year can now be repeated in a weekend. Sustained growth in data modelling, complex simulations and visualisations, facilitate interpretation and analysis by humans and machines, leading to the development of predictive science scenarios in a wider range of disciplines. Examples of data intensive science at these extremes of scale, which enable forecasting and predictive assertions, have been described.
Assessments of the accuracy and robustness of predictions are linked to uncertainty quantification, the accuracy of the underlying model, and the integrity of the data. Key questions address community awareness and understanding of the potential implications and impact of (open) data-intensive science at new extremes of scale and complexity, and the service requirements for associated data curation and preservation.
What is the level of awareness and understanding in the wider community of the prospects and societal implications of predictive science?
How are the methodologies and tools for data quality, validation and verification, which underpin robust and trustworthy large-scale models and simulations, implemented in different disciplines? Are appropriate data quality standards in place?
How are the necessary mathematical skills available to science teams, particularly in domains such as biology?
How can services like the Digital Curation Centre, best support the effective curation and long-term preservation of complex and dynamic data models, simulations and visualisations?
Consultation Challenge 2: Continuum of Openness
Open science has been presented in this Report as a continuum, which is helpful in positioning the range of behaviours and practices observed in different disciplines and contexts. The twin aspects of openness (access and participation), have been separated to facilitate scoping the full potential of the open science vision and a listing of the perceived values and benefits of open science is given. Available evidence suggests that transparent data sharing and data re-use are far from commonplace and some of the reasons for this are examined. Peer production approaches to data curation are in their infancy but offer considerable promise as scaleable models which could be migrated to other disciplines. The more radical open notebook science methodologies are currently on the “fringe” and it is not clear whether uptake and adoption will grow in other disciplines and contexts. The challenge of “openness” across its range of interpretations, demands that we address the awareness and understanding of fundamental open science concepts, supplemented by probing exploration of practitioner experience.
What are the views of the community on open science principles, acknowledging that “openness” is a continuum or sliding scale with different groups, services, information and data, positioned at different points?
What are the views of the community on the perceived value and benefits of open science methodologies? How can these benefits be demonstrated and evaluated?
Should research funding bodies be pro-actively supporting open science principles and practice? What are the policy implications? What infrastructure is required?
How aware are the majority of scientists of the range of social Web tools available to support open science? How are the tools used in different disciplines? What are the perceived advantages and disadvantages of using collaborative tools? How can social tools add value to research? What are the cost-benefits of using these types of tools?
What are the implications of open science communication channels e.g. blogs, on scholarly publishing models? What are the views of publishers and learned societies?
How can the peer production model for data curation, be applied and adopted in other disciplines?
What are the community views on Open Notebook Science? Should these radical methods be migrated across to other disciplines and if so, which other disciplines would benefit? What key ONS development and enhancement issues need to be addressed?
Consultation Challenge 3: Citizen Science
21st Century team science has been empowered by the proliferation of social Web tools enabling globally distributed groups to work together, but we can also envisage team science embracing interested amateurs and citizens, as well as research professionals. Some established and compelling exemplars of citizen science are given, but it is noted that this model may be more suited to certain domains and types of research. However, the growth of mobile phone use in citizen journalism, for public census work and participative surveys and the the development of sensor-rich mobile devices, suggest that there is great potential for more participatory methodologies to benefit scientific research, though some significant privacy and legislative issues remain unanswered.
The influence of computer gaming approaches to motivate participants in volunteer computing initiatives is described, and the development of citizen science Web services, system architectures and the design of appropriate interfaces, is briefly explored. We need to learn much more about how the public interact with these services to maximise the value and benefit from such investment. The basic questions probing citizen science, raise significant philosophical and pragmatic issues for professional scientists, research funding bodies, higher education institutions and the wider community.
What are scientist and funder attitudes towards citizen science? What are the societal implications? What role should research funding bodies play?
What are the short, medium and long term strategic and policy implications on science practice and outcomes, of a more openly participative research approach which may pro-actively include the public?
What are the financial implications, both in terms of direct and indirect costs, investment in infrastructure and associated benefits? What are the risks? What is the impact on research quality (data, models, outcomes)?
Which disciplines and areas of research are most suited to citizen science methodologies? How should the collaboration market model be applied to research?
How will open and participative science initiatives impact on research practice in HE institutions? How should professional scientists, volunteers, amateurs and citizen scientists (and all flavours in between), work together in a socially optimal manner where there is mutual benefit? What can scientists learn from citizen journalism?
What are the technical requirements for designing effective citizen science Web services and systems? What can we learn from current successful exemplars?
Consultation Challenge 4: Credentials, Incentives and Rewards
The potential impact of these changing practices on established business models for science and scholarly communications is raised: new notions of reputation and trust are developing which challenge established norms. There is brief discussion of the current journal publishing model with associated citation metrics for UK research assessment, which does not reward data sharing, social Web contributions or peer production approaches to data curation. Some novel proposals which seek to include such parameters in research assessment metrics are presented. The implications on research funder policies, future science investment planning and scholarly communication business models are not fully understood, but it is clear that the lack of incentives for data sharing and participatory methodologies, are a barrier to the wider adoption of the open science agenda. The consultative questions explore incentivising data sharing and re-use, and strategies for enabling more open participation, in the context of open science and scholarly communications.
Should open science practices be formally recognised and rewarded as intrinsic elements of scholarly communications? How can this be best achieved?
What are the views of the research community on appropriate incentives and reward structures for data sharing, data re-use and wider participation?
What are the views of the research funding bodies? Should these types of contribution and associated metrics, be included in future research assessment frameworks? How should they be assessed? How is the proposed Scholar Factor perceived? How should such metrics supplement journal citation metrics?
What are the views of scholarly publishers and learned societies? How do these contribution channels affect scholarly communication business models?
Consultation Challenge 5: Institutional Readiness and Response
The open science agenda, with the data-intensive science at extremes of scale described in this Report, has significant implications for higher education institutions at policy, planning and operational levels. This Report raises some preliminary points and an Open Science Institutional Readiness Checklist is given as a brief aide memoire for institutions. It is hoped that by asking basic questions which explore institutional awareness, policy, planning and research practice, the community will begin to explore these substantive issues in more depth.
How aware are institutional senior management teams of the strategic implications of this potentially transformational agenda? How can research funding organisations, the JISC and other research support bodies help to raise awareness amongst institutional leaders? Who will lead and co-ordinate this work? What can be leveraged by partnerships on a global scale?
What are the implications for investment in research infrastructure? What can private sector organisations including ICT companies, contribute? What partnership opportunities arise?
How will academic structures evolve to support data-intensive science at extremes of scale? What institutional policy implications arise from open science practice? How are open scholarly communications channels such as research blogs supported in HEIs? Where are institutions positioned on open data-sharing? What are the IPR issues? What are the policy implications for institutions, of co-working with non-professionals i.e. volunteers and interested amateurs? What are the societal benefits?
What guidance is provided for research staff? How are open science issues and practices, addressed in staff induction and professional development courses? How can advocacy materials for institutions (e.g. a Team Science Toolkit), help to provide guidance and support for planning, policy development and good working practices?
Consultation Challenge 6: Data Informatics Capacity and Capability
Particular attention has been paid to the provision of data informatics capacity and capability and the role of the Library in this context. The Report asserts that Libraries are well-placed to support research data management but that new skills and roles will need to be embraced by the professional LIS community. Modifications to LIS courses will be required and there are similar training implications for new-entrant researchers and postgraduates, to equip them with the skills and methodologies required for data-intensive science. The UK Digital Curation Centre is a key resource, although the increasing demands on this relatively modest service are challenging. The consultative questions explore the embedding of skills required for open data-intensive science, the role of the Library and Information Services and implications for postgraduate training and LIS curriculum development.
What is the research community view on the current provision of data informatics skills for postgraduates and research staff? If current curricula and training are not meeting needs, how can the position be improved? Should basic data informatics training be a core element of courses? Who should provide this training? What are the costs?
How can research funding agencies best support data informatics skills development?
What is the community perspective on the roles that Libraries and Information Services could play in supporting open data-intensive science? How can academic and research libraries be empowered to engage and participate in team science initiatives?
What is the role of SCONUL, RLUK, CILIP and other professional LIS organisations?
How should Library and Information Science schools address the provision of data curation and data informatics expertise within their courses and programmes?
Finally, it is intended that recommendations for further work will arise from the subsequent community and stakeholder discussion.