Open Access to data

Because the organisation and use of data via data centres and data sharing is becoming more and more important for research, it is essential that not only publications but also research data be openly accessible. And since every publication in the field of empirical sciences is based on data, the Berlin Declaration on Open Access (OA) applies just as much to data as it does to publications.

Research data can be integrated in publications, documented indirectly, for example via links in publications, or made available in the form of independent data sets. Data is mainly collected in academic and university research (small science). Because of the wide range of research conducted in this area, it offers the greatest potential for providing open access to and permitting the (re-) use of data.

Since research data is becoming more and more extensive and complex, it is rarely presented in the publications themselves, for example in tabular form. Recent cases of data manipulation and forgery highlight the importance of Open Access to the original data as a means of ensuring the verifiability and reproducibility of research results.

Big science is particularly data-intensive. For example, work in disciplines such as bioinformatics, (empirical) geoscience and environmental sciences is based primarily on data which is collected, analysed and interpreted collaboratively. Indeed, big science is mainly organised collaboratively and furnishes prime examples of the current structural transition to e-Science. Collaborators are linked as users and suppliers via data sharing, and the data is stored in data centres or databases which are often linked or grouped together in clusters.

Because of the added value it brings, Open Access to data is especially worthwhile and gives research completely new opportunities. GenBank and the Protein Structure Database are two exceptionally successful examples: "The success of the genome project is in no small part due to the fact that the world's entire library of published DNA sequences has been an open access public source for the past 20 years. If sequences could be obtained only in the way that traditionally published work can be obtained – there would be no genome project" (Patrick Brown 2004). Another example is the fact that, by using historical DNA, environmental and other data, it was possible to find cholera distribution patterns which would not otherwise have been detectable.

Advantages of Open Access to data

In a nutshell, the main advantages of Open Access to data are:

Promotion of Open Access to data by scientific organisations

In disciplines such as astrophysics, high energy physics and molecular genetics it is customary for data either to be made accessible shortly after collection, or to incorporate links to data sources in publications, or to deposit the data on which the publications are based in a central database.

CODATA (Committee on Data for Science), a sub-organisation of the International Council for Science (ICSU), is the international organisation in the field of quality management and the exchange of scientific data. In its Principles for Dissemination of Scientific Data published in 2002, CODATA endorses Open Access to data.

In its Declaration on Access to Research Data from Public Funding, the OECD's Committee for Scientific and Technological Policy (CSTP) expresses its commitment in principle to Open Access to research data while giving due consideration to intellectual property and economic interests. For the National Institutes of Health (NIH), data sharing is a term and condition of the award of grants of $ 500,000 upwards. In its Policy on data management and sharing (January 2007), the Wellcome Trust requires that data generated by the research which it funds be shared, and - in line with its Position statement in support of open and unrestricted access to published research - makes grant approvals conditional upon the provision of the freest possible access to research results. The strategy and work programme of the Helmholtz Association provide for the storage of primary scientific data in the organisation's own data centres. The German Research Foundation (DFG) obliges grantees (only) to archive data for a minimum period of five years.

Prerequisites for data publication

Some of the prerequisites for data publication such as integrity and long-term availability are the same as those which apply to scientific and scholarly publications. The following criteria are important:

The German section of CODATA initiated a project entitled Publication and citation of scientific primary data which was funded by the German Research Foundation (DFG) from 2003 to 2005. The project was realised collaboratively by TIB and the four German World Data Centers in the field of geosciences.

Legal issues

There are specific legal issues associated with Open Access to research data. Up to recently, authors of data were advised to use the Creative Commons licence system to safeguard their rights when making their data openly accessible. Work is currently in progress on a new licence which will comply with recommendations made by initiatives such as Science Commons with regard to the implementation of Open Access to data.

Science Commons was launched in 2005 under the auspices of Creative Commons in order to meet the complex demands in the area of Open Access to scientific and scholarly data, tools and materials. The goal of Science Commons is to facilitate access to and the use and re-use of research data, and to identify and dismantle unnecessary barriers to the exchange of such data. 2005 also saw the initiation of another project dedicated to open access to data – the Global Information Commons for Science. It was launched jointly by CODATA, World Data Centers (WDC), the OECD, Science Commons and other organisations with the aim of coordinating the various initiatives dedicated to Open Access to research data, and, especially, of facilitating the re-use of the results of publicly-funded research.

EU legislation represents a barrier to the Open Access to data because it establishes for data products in EU member states a sui generis right regardless of whether copyright exists. As a result, in EU member states at least, this data cannot be used by others without the permission of the rights holder. In the case of data produced within the jurisdiction of German federal ministries and agencies, its collaborative use within the meaning of Open Access is hampered by the fact that data-producing institutions (for example land surveying offices, the German Remote Sensing Data Centre (DFD) and the German Weather Service) are partly self-financing and need the revenue from the sale of their data.

Infrastructure

The support and promotion of Open Access to data calls for a suitable infrastructure, especially with regard to mass research. The main organisations responsible for building data centres are research funders, universities and public research bodies. They are also the competent authority with regard to the formulation of policies on the selection, access and use of the data accumulated within their area of responsibility.

Collaborative and discipline-specific initiatives devoted to Open Access to data

At present, the main activities and initiatives devoted to Open Access (OA) to data are discipline-specific. They can be classified as follows:

Open Access to data: prospects and barriers

There are many barriers to the Open Access to data:

Data sharing – especially open data sharing – opens up new synergetic potential to research in all areas in which data is used or collected. As a result, it is an important issue for research and scientific funding.

 

In June 2009, the Electronic Publishing Working Group of the German Initiative for Networked Information (DINI) published a position paper on the subject of research data in collaboration with the Helmholtz Open Access Coordination Bureau.

Further information on Open Access to data can be found on the Helmholtz Association web pages.

 

Links for further reading