What is Open data?

From Paolo Mangiafico, Duke’s Director of Digital Information Strategy:

Open Access is about more than just the publications that are the results of research – it’s also about the data generated during the research process.

While publications have always been “public” by definition (even if not universally accessible), data has more frequently been made available only on request, or when there’s some reason to question the published results.

But there are good reasons to make more data more open more often. A 2009 report from the National Academy of Sciences titled “Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age” makes the case this way:

The advance of knowledge is based on the open flow of information. Only when a researcher shares data and results with other researchers can the accuracy of the data, analyses, and conclusions be verified. Different researchers apply their own perspectives to the same body of information, which reduces the bias inherent in individual perspectives. Unrestricted access to the data used to derive conclusions also builds public confidence in the processes and outcomes of research. Furthermore, scientific, engineering, and medical research is a cumulative process. New ideas build on earlier knowledge, so that the frontiers of human understanding continually move outward.

Researchers use each other’s data and conclusions to extend their own ideas, making the total effort much greater than the sum of the individual efforts.

Openness speeds and strengthens the advance of human knowledge.” (p. 59)

While not all data should be kept and not all data can be shared, policies, processes, and infrastructure are being developed in many fields and at many institutions to promote openness of research data wherever possible. One example based here at Duke, the Dryad repository, part of the National Evolutionary Synthesis Center, is working with stakeholders from journals and scientific societies to develop data sharing policies, and a place to deposit data underlying scientific publications. Similar policies have been adopted by funding agencies and are becoming an expectation in many fields.

Making your data openly accessible can also bring more attention to your work.

One study that examined the citation history of 85 cancer microarray clinical trial publications found that

48% of trials with publicly available microarray data received 85% of the aggregate citations. Publicly available data was significantly (p = 0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin using linear regression.” (Piwowar HA,  Day RS, Fridsma DB, 2007 “Sharing Detailed Research Data Is Associated with Increased Citation Rate.” PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308)

Want to learn more about open data and how you can share the results of your work? Here are some starting points:

For more information, see the Open Access at Duke web site.Open Access logo, designed by PLoS