Page:Finch Group report.pdf/26

From Wikisource
Jump to navigation Jump to search
This page has been proofread, but needs to be validated.

26


3.19. Related to such moves has been a growth of interest in exploiting the potential of text-mining tools to analyse and process the information contained in collections or corpora of journal articles and other documents in order to extract relevant information, to manipulate it, and to generate new information. The use of such techniques is not yet widespread, not least because arrangements for making publications available for text mining can be complex, and because the entry costs are high for those who lack the necessary technical skills. But text mining offers considerable potential to increase the efficiency, effectiveness and quality of research, to unlock hidden information, and to develop new knowledge.[1] The Government recently consulted upon the proposal in the Hargreaves Review of Intellectual Property to remove one of the barriers to wider adoption of text mining by introducing a new exception to copyright. This would allow whole copyright works to be copied for the purposes of text-mining and data-mining for non-commercial research.[2] We note that publishers of open access and hybrid journals can generally take a more relaxed view about the rights of users to analyse and manipulate the contents of their journals; but we have not repeated in our own work any investigation of the issues covered by the Hargreaves Report.

3.20. The data deluge. Computational and remote sensing technologies have in recent years created new ways of doing science. They have led to what some have referred to as a data deluge, and a new era of data-driven research. The business of both the public and commercial sectors is increasingly driven by the gathering and progressively more sophisticated analysis of data from a range of sources. It has been estimated that by 2020 35 zetabytes (1021 bytes) of digital data will be created each year. Linked data and semantic web technologies promise the creation of new information by deep integration of an increasing number of datasets of growing complexity, and finding new ways of re-using them. It is not our purpose to examine all the consequences of the huge growth in the volume and scope of the data that researchers gather, create and use. Many of the implications are considered in the Royal Society’s report on Science as an Open Enterprise referred to earlier.[3] We note, however, that data is increasingly important in its own right as an output of research; and that there is increasing interest in how to support researchers in managing their data more effectively, and in making it available for others to use in their own research and for other purposes.[4] For the infrastructure and services through which data are made available and readily-usable are now seen as an essential underpinning for successful research.

  1. McDonald, D et al, The Value and Benefits of Text Mining, JISC, 2012.
  2. I Hargreaves, Digital Opportunity: A Review of Intellectual Property and Growth, Intellectual Property Office, 2011; Consultation on Copyright, Intellectual Property Office 2012,
  3. Royal Society, Science as an Open Enterprise, forthcoming 2012
  4. See, for example, the OECD’s Principles and Guidelines for Access to Research Data from Public Funding. OECD Publications. Paris. 2007; and the guidance produced in the UK by JISC, the Digital Curation Centre, the Research Councils and others. For an example of Research Council guidance, see the Biotechnology and Biological Sciences Research Council, BBSRC Data Sharing Policy, June 2010.