All posts by Jen Darragh

Local Data Repository Infrastructure Matters

Over the past seven years, the Duke University Libraries Research Data Curation Program and the Duke Research Data Repository (RDR) have developed into essential resources for the Duke research community. What began as a “proof of concept” turned into a robust, well-used data publishing service meeting both publisher and funder requirements and furthering Duke’s commitment to a “culture of open science and open scholarship to ensure transparency and accountability in research.”

The Duke Research Data Repository (RDR) has published over 310 datasets in a myriad of scientific disciplines including chemistry, biology, biomedical engineering, marine science, and medicine. Additionally, the RDR hosts datasets associated with articles published in PLOS, Nature, and PNAS and funded by NIH and NSF, their multiple sub-agencies and institutes, and others. The RDR provides a DOI for all datasets, and commits to long-term access and retention of data and curates all data based on the Data Curation Network (DCN) CURATED model.  As members of the renowned Data Curation Network (DCN), we leverage the expertise of a multi-institutional consortium that shares expertise to increase the quality of data curation across all DCN affiliated repositories.  The Duke Libraries are proud to be able to offer this local resource that supports the needs of Duke researchers who are not sufficiently served by disciplinary, data type or funder-based resources. In addition to providing the platform, we also provide front-line services for data management planning, data curation, disclosure risk review and referrals .

As the culture and landscape of data sharing evolves, researchers have many different repository options – from funder-sponsored repositories to discipline/community specific repositories to generalist repositories. Occasionally, journal publishers and funding agency Program Officers have questioned the suitability of institutional data repositories for long-term data sharing and preservation. To communicate the value of institutional resources focused on data sharing, we and other members of the DCN collaboratively wrote a letter to Science arguing that institutional data repositories provide valuable local infrastructure for researchers needing to meet data publishing guidelines.

In this letter, we detail how our repositories align with the FAIR guiding principles and commit to providing sustainable access to data critical for research reproducibility. This letter was published in the September 13 Issue of Science – Institutional Data Repositories are Vital (DOI: 10.1126/science.adr0789, open access copy available at: The DCN has also published research that examined what researchers valued about their institutional data repositories and the services they provide. As one researcher noted

“I am thankful and excited for the help in curation…I see that teamwork in this final step of research means that the best possible version of the material will be available to future generations..”

All this to say, should you, as a researcher, need to demonstrate that an institutional data repository is an acceptable strategy for sharing data, we encourage you to cite the Science letter and reference the Duke Research Data Repository’s documentation clarifying how the RDR approaches compliance with the NIH Desirable Characteristics for Data Repositories . Comments or questions about the RDR can be sent to

Publications referenced:

Jen Darragh et al. (2024). Institutional data repositories are vital. Science 385,1174-1174(2024). DOI:10.1126/science.adr0789

Marsolek W, Wright SJ, Luong H, Braxton SM, Carlson J, Lafferty-Hess S (2023). Understanding the value of curation: A survey of researcher perspectives of data curation services from six US institutions. PLoS ONE 18 (11): e0293534.

The Duke Research Data Repository Celebrates its 200th Data Deposit!

The Curation Team for the Duke Research Data Repository is happy to present an interview with Dr. Thomas Struhsaker, Retired Adjunct Professor of Evolutionary Anthropology.

CC-BY Thomas Struhsaker, Medium Juvenile Eating Charcoal, July 1994, Jozani

Dr. Struhsaker’s dataset, Digitized tape recordings of Red colobus and other African forest monkey species vocalizations, was the 200th dataset to be added to the Duke Research Data Repository. I worked closely with Tom to arrange and describe this collection. He hopes to be adding even more in the near future as he winds down his career. Tom might not know this, but his dataset has been tweeted about 36 times at this point and has been viewed 336 times since August. Ever the humble scientist, I did not know until I saw the tweets that Tom was the winner of the 2022 President’s Award from the American Society of Primatologists (congratulations Tom!).

I started my interview with Dr. Struhsaker as one typically would – by asking him to tell me about himself and his field of research. He laughs and says “Oh boy, where to begin? You’re talking half a century here.” I could listen to Tom talk for hours about his experiences as a young field biologist at a time when primatology was just figuring itself out. Tom went about his work as a naturalist – do not interfere, observe and learn. He spent 25 years in Africa (spanning 56 years from 1962-2018), observing many different species of animals, not just primates. For 18 of these 25 years Tom lived in Uganda as a full-time resident, including during the reign of Idi Amin, one of the most brutal rulers in modern history. Idi Amin aside, Tom thought that the Ugandans were some of the best folks to work with regarding conservation in Africa due to their dedication to higher education (Makerere University) with growing generations of students and the establishment of Kibale National Park. I cannot do Tom’s fascinating life justice in just this short blog post, so I encourage you to read Tom’s 2022 article, The life of a naturalist (full text access available through NetID login) and his memoir, I remember Africa: A field biologist’s half-century perspective (Perkins & Bostock Library – Duke Authors Display – QH31.S79 A3 2021). What I can tell you, at least from my perspective, is that Tom has led a life passionate about nature, wanting to know everything he could from our cohabiters on this planet and how we can best live together.  If you would like your own copy it can be purchased here.

Tom recorded these vocalizations between 1969-1992. He thought it was really important to do so because they are key to understanding communication and the social life of primates. Analysis of these recordings led Tom to conclude that among African monkeys vocalizations are relatively stable characters from an evolutionary perspective and, therefore, important in understanding phylogenetic relationships.  As for archiving and sharing the recordings of these vocalizations, Tom didn’t initially have that in mind. He instead followed the more traditional academic route of publishing articles including spectrograms, and his conclusions about the meaning of the vocalizations. Over the last two years as Tom began thinking about the legacy of his materials, he realized that while the visual representations are useful to share for analysis, it is just not the same as listening to the sounds themselves. Why not archive them to make it possible for others to hear them?

“He realized that while the visual representations are useful to share for analysis, it is just not the same as listening to the sounds themselves. Why not archive them to make it possible for others to hear them?”

With increasing human populations, deforestation, climate changes, etc., some of these animals (like the Red Colobus) have become critically endangered, and these recordings might be the only way future generations will ever be able to hear these animals. Tom’s recordings were made using reel to reel tapes on very large and heavy tape recorders with 12 D-Cell batteries. Crawling through the forest with these machines in addition to a large boom microphone was no easy feat. With the help of the Macaulay Library (Mr. Matthew Medler in particular), several of the original tapes were digitized to the high-quality WAV files we have in the collection. Tom has also augmented the collection with his own MP3 recordings. He hopes to have more WAV format from Macaulay Library in the future.

Tom did not initially know where to archive these vocalizations as they weren’t in scope for MorphoSource (another Duke-based repository for 3D imaging) where Tom will soon have a collection of red colobus monkey images available. Thanks to a suggestion from his neighbor Ben Donnelly, he reached out to the Duke Research Data Repository Curation Team (thanks for being a great colleague Ben!). This is where I (Jen Darragh), the author, come in.

Tom and I worked together over the course of a couple months to build his data deposit. Perhaps somewhat self-servingly, I asked him how he found the process. He stoked my ego with both a “fantastic, and easy peasy.” He said he would recommend us to anyone as we do our best to make the process as clear and pain-free as possible. Aw shucks Tom. You are one of my favorite depositors to work with, too.

I asked Tom what would he advise for early career researchers and those just getting started in the field when it comes to data sharing and archiving. He said that he is seeing increasing requirements as part of publishing (he’s right) and he’s in favor, as long as the person who collected the data is credited (cite properly!) and consulted when possible (collaboration is good). It’s important to advance the sciences. Repositories help to encourage good citation practices in addition to the preservation of important data for the long-term.

CC-BY Thomas Struhsaker. Medium-large juvenile red colobus (eating bark of bottle brush tree, Kanyawara, Kibale National Park, Uganda.

Tom also mentioned some longitudinal data he had collaborative built over the years with colleagues and that continues to be built upon. His experience of archiving his vocalization recordings with us (and his images with MorphoSource) got him thinking that repositories are a wonderful option to ensure that these important materials continue to persist and be used. He has thought of at least three important datasets and plans to reach out to his collaborators about archiving these data either with us in the Duke RDR, or in another formal repository of their choosing.

Tom recently shared with me a collection of photographs that he has taken in the same spot in Kibale from 1976-2018 that shows how the area went from bare grassland to a low stature forest (pre-conservation to post-conservation efforts). He has shared these with his colleagues directly to show the fascinating change over time. He now hopes to share them more broadly through the Duke RDR (forthcoming, we have some processing to do). Perhaps someone will be inspired to animate the images and then share back with us.

To close the interview, I asked Tom what his favorite animal was. I think it’s no surprise that he likes them all; there are so many he likes for different reasons, some subtle, some not (“some insects are damn weird”) and some just do incredibly interesting things. The diversity is what he loves.

Struhsaker, T. T. (2022). Digitized tape recordings of Red colobus and other African forest monkey species vocalizations. Duke Research Data Repository.


Where can I find data (or statistics) on ___________?

Helping Duke students, staff and faculty to locate data is something that we in Data and Visualization Services often do.  In this blog post I will walk you through a sample search and share some tips that I use when I search for data and statistics.

“Hi there, I am looking for motorcycle registration numbers and sales volumes by age and sex for the United States.”


There are two types of data needed: motorcycle registration data and motorcycle sales data. There are two criteria that the data should be differentiated by: owner’s age and owner’s gender.
There is a geographic component: United States.

One criteria that is not given is time.  When a time frame isn’t provided, I assume that what is needed is the most current data available.  Something to consider is that “current” often will still be a year or more old. It takes time for data to be gathered, cleaned and published.

***Pro-tip: When you are looking for data consider who/what/when and where – adding in those components makes it easier to construct your search.***


If I do not immediately have a source in mind (and sometimes even if I do, just to hit all the bases) I will use Google and structure my search as follows: motorcycle sales and registration by age and gender united states.

***Pro-tip: You can use Google (or search engine of your choice) to search across things we subscribe to and the open Web, but you will need to be connected via a Duke IP address***


One of the first results returned is from a database we subscribe to called Statistia. This source gives me the number of motorcycle owners by age in 2018, which answers part of the question, but does not include sales information or gender breakdown.

Another top result is a report on Motorcycle Trends in the United States from the Bureau of Transportation Statistics (BTS). Unfortunately, the report is from 2009 and the data cited in the article are from 2003-2007.  A search of the BTS site does not yield any thing more current. However, when I check the source list at the bottom of the report, there are several listed that I will check directly once I’ve finished looking through my search results.

***Pro-tip: Always look for sources of data in reports and figures, even if the data are old. Heading to the source can often yield more current information.***

A third result that looks promising is from a motorcycling magazine: Motorcycle Statistics in America: Demographics Change for 2018. The article reports on statistics from the 2018 owner surveys conducted by the Motorcycle Industry Council (which is one of the sources that the Bureau of Transportation report  listed). This article provides the percent of males and females that own motorcycles as well as the median age of motorcycle owners.  While this is pretty close to the data needed, it is worthwhile to look into the Motorcycle Industry Council. Experience has taught me, however, that industry data typically is neither open nor freely available.


When I go to the Motorcycle Industry Council (MIC) Web site I find that they do, indeed, have a statistical report that comes out every year which gives a comprehensive overview of the motorcycle industry.  If you are not a member, you can buy a copy of the report, but it is expensive (nearly $500).

***Pro-tip: Always check the original source even if you anticipate that there may be a paywall – it’s a good idea to evaluate all sources to ensure that they are credible and authoritative.***


In this instance, I would ultimately advise the person to use the statistics reported in the article Motorcycle Statistics in America: Demographics Change for 2018. Secondary sources aren’t ideal, and can sometimes be complicated to cite, but when you can’t get access to the primary source and that primary source is the authority, it is your best bet.

***Pro-tip: If you are using a secondary source, you should name the original source in text. For example: Data from the 2018 Motorcycle Industry Council Owner Survey (as cited by Ultimate Motorcycling, 2019) but include a citation to the secondary source in your reference list according to the formatting of the style you are using. 


In closing, the data you want might not always be the data you use – either due to the data being proprietary, restricted, or perhaps just doesn’t exist or doesn’t exist in the form you need and/or are able to use.  When this happens, take a moment to think on your research question and determine if you have the time and the resources needed to continue pursuing your question as it stands (purchasing, requesting, applying for, or collecting your own data), or if you need to broaden or change your focus to incorporate the resources you do find in a meaningful way.