Category Archives: Data

The citation advantage for OA data

As an added benefit from the close proximity of the Science Online 2011 conference, we were fortunate, in the Duke Libraries, to have a chance earlier this month to meet and talk with Heather Piwowar.  Heather is a post-doc researcher working for the team developing Dryad, a data repository sponsored by the National Evolutionary Science Center, and the author of the Research Remix blog.  She is also the corresponding author on a paper published in 2007 in PLoS One documenting the citation advantage gained by scientific papers that make their underlying data available for open access.

Since the article was new to me, I am writing about it (perhaps selfishly) even though some of my readers may already be aware of its results.  It was especially valuable to read it after having a chance to discuss the topic with Heather, who amplified, a little bit, on the possible reasons for the demonstrated advantage.  The basic result was that there was a 69% increase in citations for those papers that shared their research data.  Both in the article and in our conversation, Heather (and her co-authors) scrupulously noted that this advantage might not be causal.  But it is sufficiently significant to lead one immediately to speculate on why it might be causal, and that is where the really interesting possibilities lie.

The most obvious connection between all open access and the documented citation advantage is simply increased visibility.  If more people can find an article there is more opportunity for it to be cited in later works.  But with data there is another potential connection, which is that other researchers will be able to re-analyze the data and develop new research questions, or new approaches to the same research question, based on that openly available data; as the authors phrase it, “these re-analyses may spur enthusiasm and synergy around a specific research question.”

In our conversations, another possible reason for the citation advantage emerged, and it has a nice parallel with the open access advantage for journal articles.  The suggestion was made that open access to data might increase confidence in the results reported in an article and therefore lead more subsequent authors to be willing to rely on that article in their own work.  To me, at least, this idea of increased confidence is the data equivalent of the increasingly common report we hear about how open access to the articles themselves increases and improves coverage of scientific research in the popular media.  More confidence on the “input” side and better understanding on the “output” side make open access a winning proposition for researchers.

Beyond this citation advantage, and the possible reasons for it, the article goes on to discuss some of the difficulties that researchers encounters when they determine to share their underlying data more openly.  These obstacles are especially important for librarians to be aware of, particularly as it is librarians who will frequently be called upon to help researchers develop the data management plans that are now part of the requirements for researchers funded by the National Science Foundation.  Libraries and offices of research support need to become more aware of both the difficulties that researchers might encounter and the benefits they will gain as they consider open access as part of these data management plans.

And finally, in our discussions about open data, Heather talked about another issue that will require the expertise of librarians — the inconsistencies in citing data collections and in tracking those citations.  She suggested three steps that would help make data underlying research articles both easier to find and easier for a researcher to claim credit for.  First, DOIs (digital object identifiers) should be used when citing a data set.  Second, data sets used for a research article should be reported in the reference list, not merely in the acknowledgments, which do not get tracked by databases.  Finally, since some databases strip out references that do not look like traditional articles for their citation reports, library pressure to get data sets treated equally would be very helpful to researchers, for whom the creation of a useful data compilation is a major accomplishment deserving of notice in the promotion and tenure process.