Category Archives: Data Curation

Share More Data in the Duke Research Data Repository!

We are happy to announce expanded features for the public sharing of large scale data in the Duke Research Data Repository! The importance of open science for the public good is more relevant than ever and scientific research is increasingly happening at scale. Relatedly, journals and funding agencies are requiring researchers to share the data produced during the course of their research (for instance see the newly released NIH Data Management and Sharing Policy). In response to this growing and evolving data sharing landscape, the Duke Research Data Repository team has partnered with Research Computing and OIT to integrate the Globus file transfer system to streamline the public sharing of large scale data generated at Duke. The new RDR features include:

  • A streamlined workflow for depositing large scale data to the repository
  • An integrated process for downloading large scale data (datasets over 2GB) from the repository
  • New options for exporting smaller datasets directly through your browser
  • New support for describing and using collections to highlight groups of datasets generated by a project or group (see this example)
  • Additional free storage (up to 100 GB per deposit) to the Duke community during 2021!

While using Globus for both upload and download requires a few configuration steps by end users, we have strived to simplify this process with new user documentation and video walk-throughs. This is the perfect time to share those large(r) datasets (although smaller datasets are also welcome!).

Contact us today with questions or get started with a deposit!

Publish Your Data: Researcher Highlight

This post was authored by Shadae Gatlin, DUL Repository Services Analyst and member of the Research Data Curation Team.

Collaborating for openness

The Duke University Libraries’ Research Data Curation team has the privilege to collaborate with exceptional researchers and scholars who are advancing their fields through open data sharing in the Duke Research Data Repository (RDR). One such researcher, Martin Fischer, Ph.D., Associate Research Professor in the Departments of Chemistry and Physics, recently discussed his thoughts on open data sharing with us. A trained physicist, Dr. Fischer describes himself as an “optics person” his work ranges from developing microscopes that can examine melanin in tissues to looking at pigment distribution in artwork. He has published data in the RDR on more than one occasion and says of the data deposit process that, “I can only say, it was a breeze.”

“I can only say, it was a breeze.”

Dr. Fischer recalls his first time working with the team as being “much easier than I thought it was going to be.” When Dr. Fischer and colleagues experienced obstacles trying to setup OMERO, a server to host their project data, they turned to the Duke Research Data Repository as a possible solution to storing the data. This was Dr. Fischer’s first foray into open data publishing, and he characterizes the team as being  responsive and easy to work with. Due to the large size of the data, the team even offered to pick up the hard drive from Fischer’s office. After they acquired the data, the team curated, archived, and then published it, resulting in Fischer’s first dataset in the RDR.

Why share data?

When asked why he believes open data sharing is important, Dr. Fischer says that “sharing data creates an opportunity for others to help develop things with you.” For example, after sharing his latest dataset  which evaluates the efficacy of masks to reduce the transmission of respiratory droplets, Fischer received requests for a non-proprietary option for data analysis instead of using the team’s data analysis scripts written for the commercial program Mathematica. Peers offered to help develop a Python script, which is now openly available, and for which the developers used the RDR data as a reference. As of January 2021, the dataset has had 991 page views.

Dr. Fischer appreciates the opportunity for research development that open data sharing creates, saying, “Maybe somebody else will develop a routine, or develop something that is better, easier than what we have”. Datasets deposited in the RDR are made publicly available for download and receive a permanent DOI link, which makes the data even more accessible.

“Maybe somebody else will develop a routine, or develop something that is better, easier than what we have.”

In addition to the benefits of long-term preservation and access that publishing data in the RDR provides, Dr. Fischer finds that sharing his data openly encourages a sense of accountability. “I don’t have a problem with other people going in and trying, and making sure it’s actually right. I welcome the opportunity for feedback”. With many research funding agencies introducing policies for research data management and data sharing practices, the RDR is a great option for Duke researchers. Every dataset that is accepted into the RDR is carefully curated to meet FAIR guidelines and optimized for future reuse.

Collaborating with researchers like Dr. Martin Fischer is one of the highlights of working on the Research Data Curation team. We look forward to seeing what fascinating data 2021 will bring to the RDR and working with more Duke researchers to share their data with the world.

Dr. Fischer’s Work in the Duke Research Data Repository:

  • Wilson, J. W., Degan, S., Gainey, C. S., Mitropoulos, T., Simpson, M. J., Zhang, J. Y., & Warren, W. S. (2019). Data from: In vivo pump-probe and multiphoton fluorescence microscopy of melanoma and pigmented lesions in a mouse model. Duke Digital Repository. https://doi.org/10.7924/r4cc0zp95
  • Fischer, E., Fischer, M., Grass, D., Henrion, I., Warren, W., Westman, E. (2020). Video data files from: Low-cost measurement of facemask efficacy for filtering expelled droplets during speech. Duke Research Data Repository. V2 https://doi.org/10.7924/r4ww7dx6q

Got Data? Data Publishing Services at Duke Continue During COVID-19

While the library may be physically closed, the Duke Research Data Repository (RDR) is open and accepting data deposits. If you have a data sharing requirement you need to meet for a journal publisher or funding agency we’ve got you covered. If you have COVID-19 data that can be openly shared, we can help make these vital research materials available to the public and the research community today. Or if you have data that needs to be under access restrictions, we can connect you to partner disciplinary repositories that support clinical trials data, social science data, or qualitative data.

Speaking of the RDR, we just completed a refresh on the platform and added several features!

In-line with data sharing standards, we also assign a digital object identifier (DOI) to all datasets, provide structured metadata for discovery, curate data to further enhance datasets for reuse and reproducibility, provide safe archival storage, and a standardized citation for proper acknowledgement.

Openness supports the acceleration of science and the generation of knowledge. Within the libraries we look forward to partnering with Duke researchers to disseminate their research data! Visit https://research.repository.duke.edu/ to learn more or contact datamanagement@duke.edu with any questions.

Duke University Libraries Partners with the Qualitative Data Repository

Duke University Libraries has partnered with the Qualitative Data Repository (QDR) as an institutional member to provide qualitative data sharing, curation, and preservation services to the Duke community. QDR is located at Syracuse University and has staff and infrastructure in place to specifically address some of the unique needs of qualitative data including curating data for future reuse, providing mediated access, and assisting with Data Use Agreements.

Duke University Libraries has long been committed to helping our scholars make their research openly accessible and stewarding these materials for the future. Over the past few years, this has included launching a new data repository and curation program, which accepts data from any discipline as well as joining the Data Curation Network. Now through our partnership with QDR we can further enhance our support for sharing and archiving qualitative data.

Qualitative data come in a variety of forms including interviews, focus groups, archival materials, textual documents, observational data, and some surveys. QDR can help Duke researchers have a broader impact through making these unique data more widely accessible.

“Founded and directed by qualitative researchers, QDR is dedicated to helping researchers share their qualitative data,” says Sebastian Karcher, QDR’s associate director. “Informed by our deep understanding of qualitative research, we help researchers share their data in ways that reflect both their ethical commitments and do justice to the richness and diversity of qualitative research. We couldn’t be more excited to continue our already fruitful partnership with Duke University Libraries”

Through this partnership, Duke University Libraries will have representation on the governance board of QDR and be involved in the latest developments in managing and sharing qualitative data. The libraries will also be partnering with QDR to provide virtual workshops in the spring semester at Duke to enhance understanding around the sharing and management of qualitative research data.

If you are interested in learning more about this partnership, contact datamanagement@duke.edu.

Introducing Duke Libraries Center for Data and Visualization Sciences

As data driven research has grown at Duke, Data and Visualization Services receives an increasing number of requests for partnerships, instruction, and consultations. These requests have deepened our relationships with researchers across campus such that we now regularly interact with researchers in all of Duke’s schools, disciplines, and interdepartmental initiatives.

In order to expand the Libraries commitment to partnering with researchers on data driven research at Duke, Duke University Libraries is elevating the Data and Visualization Services department to the Center for Data and Visualization Sciences (CDVS). The change is designed to enable the new Center to:

  • Expand partnerships for research and teaching
  • Augment the ability of the department to partner on grant, development, and funding opportunities
  • Develop new opportunities for research, teaching, and collections – especially in the areas of data science, data visualization, and GIS/mapping research
  • Recognize the breadth and demand for the Libraries expertise in data driven research support
  • Enhance the role of CDVS activities within Bostock Libraries’ Edge Research Commons

We believe that the new Center for Data and Visualization Sciences will enable us to partner with an increasingly large and diverse range of data research interests at Duke and beyond through funded projects and co-curricular initiatives at Duke. We look forward to working with you on your next data driven project!

Highlights from Expanding our Research Data Management Program

Since the launch of our expanded research data management (RDM) program in January, the Research Data Management Team in DVS has been busy defining and implementing our suite of services. Our “Lifecycle Services” are designed to assist scholars at all stages of their research project from the planning phase to the final curation and disposition of their data in an archive or repository. Our service model centers on four key areas: data management planning, data workflow design, data and documentation review, and data repository support. Over the past nine months, we have  worked with Duke researchers across disciplines to provide these services, allowing us to see their value in action. Below we present some examples of how we have supported researchers within our four support areas.

Data Management Planning

With increasing data management plan requirements Data Management Planningas well as growing  expectations that funding agencies will more strictly enforce and evaluate these plans, researchers are seeking assistance ensuring their plans comply with funder requirements. Through in-person consultations and online review through the DMPTool, we have helped researchers enhance their DMPs for a variety of funding agencies including the NSF Sociology Directorate, the Department of Energy, and the NSF Computer & Information Science & Engineering (CISE) Program.

Data Workflow Design

As research teams begin a project there are a variety Data Workflow Designof organizational and workflow decisions that need to be made from selecting appropriate tools to implementing storage and backup strategies (to name a few). Over the past 6 months, we have had the opportunity to help a multi-institutional Duke Marine Lab Behavioral Response Study (BRS) implement their project workflow using the Open Science Framework (OSF). We have worked with project staff to think through the organization of materials, provided training on the use of the tool, and strategized on storage and backup options.

Data and Documentation Review

During a project, researchers make decisions about how to format, Data and Documentation Reviewdescribe, and structure their data for sharing and preservation. Questions may also arise surrounding how to ethically share human subjects data and navigate intellectual property or copyright issues. In conversations with researchers, we have provided suggestions for what formats are best for portability and preservation, discussed their documentation and metadata plans, and helped resolve intellectual property questions for secondary data.

Data Repository Support

At the end of a project, researchers may be required Data Repository Supportor choose to deposit their data in an archive or repository. We have advised faculty and students on repository options based on their discipline, data type, and repository features. One option available to the Duke community is the Duke Digital Repository. Over the past nine months, we have assisted with the curation of a variety of datasets deposited within the DDR, many of which underlie journal publications.

This year Duke news articles have featured two research studies with datasets archived within the DDR, one describing a new cervical cancer screening device and another presenting cutting-edge research on a potential new state of matter. The accessibility of both Asiedu et al.’s screening device data and Charbonneau and Yaida’s glass study data enhances the overall transparency and reproducibility of these studies.

Our experiences thus far have enabled us to better understand the diversity of researchers’ needs and allowed us to continue to hone and expand our knowledge base of data management best practices, tools, and resources. We are excited to continue to work with and learn from researchers here at Duke!

Open Science Framework @ Duke

Center for Open ScienceThe Open Science Framework (OSF) is a free, open source project management tool developed and maintained by the Center for Open Science (COS). OSF offers many features that can help scholars manage their workflow and outputs throughout the research lifecycle. From collaborating effectively, to managing data, code, and protocols in a centralized location, to sharing project materials with the broader research community, the OSF provides tools that support openness, research integrity, and reproducibility. Some of the key functionalities of the OSF include:

  • Integrations with third-party tools that researchers already use (i.e., Box, Google Drive, GitHub, Mendeley, etc.)
  • Hierarchical organizational structures
  • Unlimited native OSF storage*
  • Built-in version control
  • Granular privacy and permission controls
  • Activity log that tracks all project changes
  • Built-in collaborative wiki and commenting pane
  • Analytics for public projects
  • Persistent, citable identifiers for projects, components, and files along with Digital Object Identifiers (DOIs) and Archival Resource Keys (ARKs) available for public OSF projects
  • And more!

Duke University is a partner institution with OSF, meaning  you can sign into the OSF using your NetID and affiliate your projects with Duke. Visit the Duke OSF page to see some Duke research projects and outputs from our community.

Duke University Libraries has also partnered with COS to host a workshop this fall entitled “Increasing Openness and Reproducibility in Quantitative Research.” This workshop will teach participants how they can increase the reproducibility of their work and will include hands-on exercises using the OSF.

Workshop Details
Date: October 3, 2017
Time: 9 am to 12 pm
Register:
http://duke.libcal.com/event/3433537

If you are interested in affiliating an existing OSF project, want to learn more about how the OSF can support your workflow, or would like a demonstration of the OSF, please contact askdata@duke.edu.

*Individual file size limit of 5 GB. Users can upload larger files by connecting third party add-ons to their OSF projects.

Fall Data and Visualization Workshops

2017 Data and Visualization Workshops

Visualize, manage, and map your data in our Fall 2017 Workshop Series.  Our workshops are designed for researchers who are new to data driven research as well as those looking to expand skills with new methods and tools. With workshops exploring data visualization, digital mapping, data management, R, and Stata, the series offers a wide range of different data tools and techniques. This fall, we are extending our partnership with the Graduate School and offering several workshops in our data management series for RCR credit (please see course descriptions for further details).

Everyone is welcome at Duke Libraries workshops.  We hope to see you this fall!

Workshop Series by Theme

Data Management

09-13-2017 – Data Management Fundamentals
09-18-2017 – Reproducibility: Data Management, Git, & RStudio 
09-26-2017 – Writing a Data Management Plan
10-03-2017 – Increasing Openness and Reproducibility in Quantitative Research
10-18-2017 – Finding a Home for Your Data: An Introduction to Archives & Repositories
10-24-2017 – Consent, Data Sharing, and Data Reuse 
11-07-2017 – Research Collaboration Strategies & Tools 
11-09-2017 – Tidy Data Visualization with Python

Data Visualization

09-12-2017 – Introduction to Effective Data Visualization 
09-14-2017 – Easy Interactive Charts and Maps with Tableau 
09-20-2017 – Data Visualization with Excel
09-25-2017 – Visualization in R using ggplot2 
09-29-2017 – Adobe Illustrator to Enhance Charts and Graphs
10-13-2017 – Visualizing Qualitative Data
10-17-2017 – Designing Infographics in PowerPoint
11-09-2017 – Tidy Data Visualization with Python

Digital Mapping

09-12-2017 – Intro to ArcGIS Desktop
09-27-2017 – Intro to QGIS 
10-02-2017 – Mapping with R 
10-16-2017 – Cloud Mapping Applications 
10-24-2017 – Intro to ArcGIS Pro

Python

11-09-2017 – Tidy Data Visualization with Python

R Workshops

09-11-2017 – Intro to R: Data Transformations, Analysis, and Data Structures  
09-18-2017 – Reproducibility: Data Management, Git, & RStudio 
09-25-2017 – Visualization in R using ggplot2 
10-02-2017 – Mapping with R 
10-17-2017 – Intro to R: Data Transformations, Analysis, and Data Structures
10-19-2017 – Developing Interactive Websites with R and Shiny 

Stata

09-20-2017 – Introduction to Stata
10-19-2017 – Introduction to Stata 

 

 

 

 

 

 

 

 

 

 

 

 

Love Your Data Week (Feb. 13-17)

In cooperation with the Triangle Research Library Network, Duke Libraries will be participating in Love Your Data Week on February 13-17, 2017. Love Your Data Week is an international event to help researchers take better care of their data. The campaign focuses on raising awareness and building community around data management, sharing, preservation, and reuse.

The theme for Love Your Data Week 2017 is data quality, with a related message for each day.

  • Monday: Defining Data Quality
  • Tuesday: Documenting, Describing, and Defining
  • Wednesday: Good Data Examples
  • Thursday: Finding the Right Data
  • Friday: Rescuing Unloved Data

Throughout the week, Data and Visualization Services will be contributing to the conversation on Twitter (@duke_data). We will also host the following local programming related to the daily themes:

In honor of Love Your Data Week chocolates will be provided at these workshops!

The new Research Data Management staff at the Duke Libraries are available to help researchers care for their data through consultations, support services, and instruction.  We can assist with writing data management plans that comply with funder policies, advise on data management best practices, and facilitate the ingest of data into repositories. To learn more about general data management best practices, see our newly updated RDM guide

Contact us at askdata@duke.edu to find out how we can help you love your data! 

Get involved in Love Your Data Week by following the conversation at #LYD17, #loveyourdata, and #trlndata.

All promotional Love Your Data 2017 materials used under a Creative Commons Attribution 4.0 International License.

Citation: Bass, M., Neeser, A., Atwood, T., and Coates, H. (2017). Love Your Data Week Promotional Materials. [image files]. Retrieved from https://osf.io/r8tht/files/

New Data Management Services @ Duke

Data ManagementDuke Libraries are happy to announce a new set of research data management services designed to help researchers secure grant funding, increase research impact, and preserve valuable data. Building on the recommendations of the Digital Research Faculty Working Group and the Duke Digital Research Data Services and Support report, Data and Visualization Services have added two new research data management consultants who are available to work with researchers across the university and medical center on a broad range of data management concerns from data creation to data curation.

Interested in learning more about data management?

Our New Data Management Consultants

sophialh2Sophia Lafferty-Hess attended the University of North Carolina at Chapel Hill where she received a Master of Science in Information Science and Master of Public Administration. Prior to coming to Duke, Sophia worked at the Odum Institute for Research in Social Science at UNC-Chapel Hill within the Data Archive as a Research Data Manager. In this position, Sophia provided consultations to researchers on data management best practices, curated research data to support long-term preservation and reuse, and provided training and instruction on data management policies, strategies, and tools.

While at Odum, Sophia also helped lead the development of a data curation and verification service for journals to help enforce data sharing and replication policies, which included verifying that data meet quality standards for reuse and that the data and code can properly reproduce the analytic results presented in the article. Sophia’s current research interests include the impact of journal data sharing policies on data availability and the development of data curation workflows.

jen2Jen Darragh comes to us from Johns Hopkins University where she served for the past seven years as the Data Services and Sociology Librarian, and Hopkins Population Center Restricted Projects Coordinator.  In this position, Jen  developed the libraries’ Restricted Data Room and designed the secure data enclave spaces and staff support for the Johns Hopkins Population Center.

Jen received her Bachelor of Arts Degree in Psychology from Westminster College (PA) and her Master of Library and Information Sciences degree from the University of Pittsburgh.  She has been involved with socio-behavioral research data throughout her career.  Jen is particularly interested in the development of centralized, controlled data access for sensitive human subjects’ data (subject to HIPAA or FERPA requirements) to facilitate broader, yet more secure sharing of existing research data as a means to produce new, cutting-edge research.