Category Archives: Data Curation

Dr. Mark Palmeri: An honest assessment of openness

This post is part of the Duke Research Data Curation Team’s ‘Researcher Highlight’ series.

In the field of engineering, a key driving motivator is theDr. Mark Palmeri urge to solve problems and provide tools to the community to address those problems. For Dr. Mark Palmeri, Professor in Biomedical Engineering at Duke University, open research practices support the ultimate goals of this work, and helps get the data into the hands of those solving problems: “It’s one thing to get a publication out there and see it get cited. It’s totally another thing to see people you have no direct professional connection to accessing the data and see it impacting something they’re doing…”

Dr. Palmeri’s research focuses on medical ultrasonic imaging, specifically using acoustic radiation force imaging to characterize the stiffness of tissues. His code and data allow other researchers to calibrate and validate processing protocols, and facilitate training of deep learning algorithms. He recently sat down with the Duke Research Data Repository Curation Team to discuss his thoughts on open science and data publishing.

“It’s one thing to get a publication out there and see it get cited. It’s totally another thing to see people you have no direct professional connection to accessing the data and see it impacting something they’re doing…”

With the new NIH data management and sharing policy on the horizon, many researchers are now considering what sharing data looks like for their own work. Palmeri highlighted some common challenges that many researchers will face, such as the inability to share proprietary data when working with industry partners, de-identifying data for public use (and who actually signs off on this process), the growing scope and scale of data in the digital age, and investing the necessary time to prepare data for public consumption. However, two of his biggest challenges relate to the changing pace of technology and the lack of data standards.

ClockWhen publishing a dataset, you necessarily have a static version of the dataset established in space and time via a persistent identifier (i.e., DOI); however, Palmeri’s code and software outputs are constantly evolving as are the underlying computational environments. This mismatch can result in datasets becoming out of sync with the coding tools, thereby affecting future reuse and ultimately keeping things up-to-date takes time and effort. As Palmeri notes, in the fast-paced culture of academia “no one has time to keep old project data up to snuff.”

Likewise, while certain types of data in medical imaging have standardized formats (e.g., DICOM), for the images Palmeri is creating from raw signal data there are no ubiquitous standards. This creates problems for data reuse. Palmeri remarks that “There’s no data model that exists to say what metadata should be provided, in what units, what major fields and subfields, so that becomes a major strain on the ability to meaningfully share the data, because if someone can’t open it up and know how to parse it and unwrap it and categorize it, you’re sharing gigabytes of bits that don’t really help anyone.” Currently, Dr. Palmeri is working with the Quantitative Imaging Biomarkers Alliance and the International Electrotechnical Commision (IEC) TC87 (Ultrasonics) WG9 (Shear Wave Elastography) to create a public standard for this technology for clinical use.

Ultrasound scanner images
Image processing example using MimickNet

Regardless of these challenges, Palmeri sees many benefits to publicly sharing data including enhancing “our internal rigor even just that little bit more” as well as opening “new doors of opportunity for new research questions…and then the scope and impact of the work can be augmented.” Dr. Palmeri appreciates the infrastructure provided by the Duke University Libraries to host his data in a centralized and distributed network as well as the ability to cite his data via the DOI. As he notes “you don’t want to just put up something on Box as those services can change year to year and don’t provide a really good preserved resource.” Beyond the infrastructure, he appreciates how the curation team provides “an objective third party [to] look at things and evaluate how shareable is this.”

“you don’t want to just put up something on Box as those services can change year to year and don’t provide a really good preserved resource.”

Within the Duke Research Data Repository, we have a mission to help Duke researchers make their data accessible to enable reproducibility and reuse. Working with researchers, like Dr. Palmeri, to realize a future where open research practices lead to a greater impact for researchers and democratizes knowledge is a core driving motivator. Contact us (datamanagement@duke.edu) with any questions you might have about starting your own data sharing adventure!

Share More Data in the Duke Research Data Repository!

We are happy to announce expanded features for the public sharing of large scale data in the Duke Research Data Repository! The importance of open science for the public good is more relevant than ever and scientific research is increasingly happening at scale. Relatedly, journals and funding agencies are requiring researchers to share the data produced during the course of their research (for instance see the newly released NIH Data Management and Sharing Policy). In response to this growing and evolving data sharing landscape, the Duke Research Data Repository team has partnered with Research Computing and OIT to integrate the Globus file transfer system to streamline the public sharing of large scale data generated at Duke. The new RDR features include:

  • A streamlined workflow for depositing large scale data to the repository
  • An integrated process for downloading large scale data (datasets over 2GB) from the repository
  • New options for exporting smaller datasets directly through your browser
  • New support for describing and using collections to highlight groups of datasets generated by a project or group (see this example)
  • Additional free storage (up to 100 GB per deposit) to the Duke community during 2021!

While using Globus for both upload and download requires a few configuration steps by end users, we have strived to simplify this process with new user documentation and video walk-throughs. This is the perfect time to share those large(r) datasets (although smaller datasets are also welcome!).

Contact us today with questions or get started with a deposit!

Publish Your Data: Researcher Highlight

This post was authored by Shadae Gatlin, DUL Repository Services Analyst and member of the Research Data Curation Team.

Collaborating for openness

The Duke University Libraries’ Research Data Curation team has the privilege to collaborate with exceptional researchers and scholars who are advancing their fields through open data sharing in the Duke Research Data Repository (RDR). One such researcher, Martin Fischer, Ph.D., Associate Research Professor in the Departments of Chemistry and Physics, recently discussed his thoughts on open data sharing with us. A trained physicist, Dr. Fischer describes himself as an “optics person” his work ranges from developing microscopes that can examine melanin in tissues to looking at pigment distribution in artwork. He has published data in the RDR on more than one occasion and says of the data deposit process that, “I can only say, it was a breeze.”

“I can only say, it was a breeze.”

Dr. Fischer recalls his first time working with the team as being “much easier than I thought it was going to be.” When Dr. Fischer and colleagues experienced obstacles trying to setup OMERO, a server to host their project data, they turned to the Duke Research Data Repository as a possible solution to storing the data. This was Dr. Fischer’s first foray into open data publishing, and he characterizes the team as being  responsive and easy to work with. Due to the large size of the data, the team even offered to pick up the hard drive from Fischer’s office. After they acquired the data, the team curated, archived, and then published it, resulting in Fischer’s first dataset in the RDR.

Why share data?

When asked why he believes open data sharing is important, Dr. Fischer says that “sharing data creates an opportunity for others to help develop things with you.” For example, after sharing his latest dataset  which evaluates the efficacy of masks to reduce the transmission of respiratory droplets, Fischer received requests for a non-proprietary option for data analysis instead of using the team’s data analysis scripts written for the commercial program Mathematica. Peers offered to help develop a Python script, which is now openly available, and for which the developers used the RDR data as a reference. As of January 2021, the dataset has had 991 page views.

Dr. Fischer appreciates the opportunity for research development that open data sharing creates, saying, “Maybe somebody else will develop a routine, or develop something that is better, easier than what we have”. Datasets deposited in the RDR are made publicly available for download and receive a permanent DOI link, which makes the data even more accessible.

“Maybe somebody else will develop a routine, or develop something that is better, easier than what we have.”

In addition to the benefits of long-term preservation and access that publishing data in the RDR provides, Dr. Fischer finds that sharing his data openly encourages a sense of accountability. “I don’t have a problem with other people going in and trying, and making sure it’s actually right. I welcome the opportunity for feedback”. With many research funding agencies introducing policies for research data management and data sharing practices, the RDR is a great option for Duke researchers. Every dataset that is accepted into the RDR is carefully curated to meet FAIR guidelines and optimized for future reuse.

Collaborating with researchers like Dr. Martin Fischer is one of the highlights of working on the Research Data Curation team. We look forward to seeing what fascinating data 2021 will bring to the RDR and working with more Duke researchers to share their data with the world.

Dr. Fischer’s Work in the Duke Research Data Repository:

  • Wilson, J. W., Degan, S., Gainey, C. S., Mitropoulos, T., Simpson, M. J., Zhang, J. Y., & Warren, W. S. (2019). Data from: In vivo pump-probe and multiphoton fluorescence microscopy of melanoma and pigmented lesions in a mouse model. Duke Digital Repository. https://doi.org/10.7924/r4cc0zp95
  • Fischer, E., Fischer, M., Grass, D., Henrion, I., Warren, W., Westman, E. (2020). Video data files from: Low-cost measurement of facemask efficacy for filtering expelled droplets during speech. Duke Research Data Repository. V2 https://doi.org/10.7924/r4ww7dx6q

Got Data? Data Publishing Services at Duke Continue During COVID-19

While the library may be physically closed, the Duke Research Data Repository (RDR) is open and accepting data deposits. If you have a data sharing requirement you need to meet for a journal publisher or funding agency we’ve got you covered. If you have COVID-19 data that can be openly shared, we can help make these vital research materials available to the public and the research community today. Or if you have data that needs to be under access restrictions, we can connect you to partner disciplinary repositories that support clinical trials data, social science data, or qualitative data.

Speaking of the RDR, we just completed a refresh on the platform and added several features!

In-line with data sharing standards, we also assign a digital object identifier (DOI) to all datasets, provide structured metadata for discovery, curate data to further enhance datasets for reuse and reproducibility, provide safe archival storage, and a standardized citation for proper acknowledgement.

Openness supports the acceleration of science and the generation of knowledge. Within the libraries we look forward to partnering with Duke researchers to disseminate their research data! Visit https://research.repository.duke.edu/ to learn more or contact datamanagement@duke.edu with any questions.

Duke University Libraries Partners with the Qualitative Data Repository

Duke University Libraries has partnered with the Qualitative Data Repository (QDR) as an institutional member to provide qualitative data sharing, curation, and preservation services to the Duke community. QDR is located at Syracuse University and has staff and infrastructure in place to specifically address some of the unique needs of qualitative data including curating data for future reuse, providing mediated access, and assisting with Data Use Agreements.

Duke University Libraries has long been committed to helping our scholars make their research openly accessible and stewarding these materials for the future. Over the past few years, this has included launching a new data repository and curation program, which accepts data from any discipline as well as joining the Data Curation Network. Now through our partnership with QDR we can further enhance our support for sharing and archiving qualitative data.

Qualitative data come in a variety of forms including interviews, focus groups, archival materials, textual documents, observational data, and some surveys. QDR can help Duke researchers have a broader impact through making these unique data more widely accessible.

“Founded and directed by qualitative researchers, QDR is dedicated to helping researchers share their qualitative data,” says Sebastian Karcher, QDR’s associate director. “Informed by our deep understanding of qualitative research, we help researchers share their data in ways that reflect both their ethical commitments and do justice to the richness and diversity of qualitative research. We couldn’t be more excited to continue our already fruitful partnership with Duke University Libraries”

Through this partnership, Duke University Libraries will have representation on the governance board of QDR and be involved in the latest developments in managing and sharing qualitative data. The libraries will also be partnering with QDR to provide virtual workshops in the spring semester at Duke to enhance understanding around the sharing and management of qualitative research data.

If you are interested in learning more about this partnership, contact datamanagement@duke.edu.

Introducing Duke Libraries Center for Data and Visualization Sciences

As data driven research has grown at Duke, Data and Visualization Services receives an increasing number of requests for partnerships, instruction, and consultations. These requests have deepened our relationships with researchers across campus such that we now regularly interact with researchers in all of Duke’s schools, disciplines, and interdepartmental initiatives.

In order to expand the Libraries commitment to partnering with researchers on data driven research at Duke, Duke University Libraries is elevating the Data and Visualization Services department to the Center for Data and Visualization Sciences (CDVS). The change is designed to enable the new Center to:

  • Expand partnerships for research and teaching
  • Augment the ability of the department to partner on grant, development, and funding opportunities
  • Develop new opportunities for research, teaching, and collections – especially in the areas of data science, data visualization, and GIS/mapping research
  • Recognize the breadth and demand for the Libraries expertise in data driven research support
  • Enhance the role of CDVS activities within Bostock Libraries’ Edge Research Commons

We believe that the new Center for Data and Visualization Sciences will enable us to partner with an increasingly large and diverse range of data research interests at Duke and beyond through funded projects and co-curricular initiatives at Duke. We look forward to working with you on your next data driven project!

Highlights from Expanding our Research Data Management Program

Since the launch of our expanded research data management (RDM) program in January, the Research Data Management Team in DVS has been busy defining and implementing our suite of services. Our “Lifecycle Services” are designed to assist scholars at all stages of their research project from the planning phase to the final curation and disposition of their data in an archive or repository. Our service model centers on four key areas: data management planning, data workflow design, data and documentation review, and data repository support. Over the past nine months, we have  worked with Duke researchers across disciplines to provide these services, allowing us to see their value in action. Below we present some examples of how we have supported researchers within our four support areas.

Data Management Planning

With increasing data management plan requirements Data Management Planningas well as growing  expectations that funding agencies will more strictly enforce and evaluate these plans, researchers are seeking assistance ensuring their plans comply with funder requirements. Through in-person consultations and online review through the DMPTool, we have helped researchers enhance their DMPs for a variety of funding agencies including the NSF Sociology Directorate, the Department of Energy, and the NSF Computer & Information Science & Engineering (CISE) Program.

Data Workflow Design

As research teams begin a project there are a variety Data Workflow Designof organizational and workflow decisions that need to be made from selecting appropriate tools to implementing storage and backup strategies (to name a few). Over the past 6 months, we have had the opportunity to help a multi-institutional Duke Marine Lab Behavioral Response Study (BRS) implement their project workflow using the Open Science Framework (OSF). We have worked with project staff to think through the organization of materials, provided training on the use of the tool, and strategized on storage and backup options.

Data and Documentation Review

During a project, researchers make decisions about how to format, Data and Documentation Reviewdescribe, and structure their data for sharing and preservation. Questions may also arise surrounding how to ethically share human subjects data and navigate intellectual property or copyright issues. In conversations with researchers, we have provided suggestions for what formats are best for portability and preservation, discussed their documentation and metadata plans, and helped resolve intellectual property questions for secondary data.

Data Repository Support

At the end of a project, researchers may be required Data Repository Supportor choose to deposit their data in an archive or repository. We have advised faculty and students on repository options based on their discipline, data type, and repository features. One option available to the Duke community is the Duke Digital Repository. Over the past nine months, we have assisted with the curation of a variety of datasets deposited within the DDR, many of which underlie journal publications.

This year Duke news articles have featured two research studies with datasets archived within the DDR, one describing a new cervical cancer screening device and another presenting cutting-edge research on a potential new state of matter. The accessibility of both Asiedu et al.’s screening device data and Charbonneau and Yaida’s glass study data enhances the overall transparency and reproducibility of these studies.

Our experiences thus far have enabled us to better understand the diversity of researchers’ needs and allowed us to continue to hone and expand our knowledge base of data management best practices, tools, and resources. We are excited to continue to work with and learn from researchers here at Duke!

Open Science Framework @ Duke

Center for Open ScienceThe Open Science Framework (OSF) is a free, open source project management tool developed and maintained by the Center for Open Science (COS). OSF offers many features that can help scholars manage their workflow and outputs throughout the research lifecycle. From collaborating effectively, to managing data, code, and protocols in a centralized location, to sharing project materials with the broader research community, the OSF provides tools that support openness, research integrity, and reproducibility. Some of the key functionalities of the OSF include:

  • Integrations with third-party tools that researchers already use (i.e., Box, Google Drive, GitHub, Mendeley, etc.)
  • Hierarchical organizational structures
  • Unlimited native OSF storage*
  • Built-in version control
  • Granular privacy and permission controls
  • Activity log that tracks all project changes
  • Built-in collaborative wiki and commenting pane
  • Analytics for public projects
  • Persistent, citable identifiers for projects, components, and files along with Digital Object Identifiers (DOIs) and Archival Resource Keys (ARKs) available for public OSF projects
  • And more!

Duke University is a partner institution with OSF, meaning  you can sign into the OSF using your NetID and affiliate your projects with Duke. Visit the Duke OSF page to see some Duke research projects and outputs from our community.

Duke University Libraries has also partnered with COS to host a workshop this fall entitled “Increasing Openness and Reproducibility in Quantitative Research.” This workshop will teach participants how they can increase the reproducibility of their work and will include hands-on exercises using the OSF.

Workshop Details
Date: October 3, 2017
Time: 9 am to 12 pm
Register:
http://duke.libcal.com/event/3433537

If you are interested in affiliating an existing OSF project, want to learn more about how the OSF can support your workflow, or would like a demonstration of the OSF, please contact askdata@duke.edu.

*Individual file size limit of 5 GB. Users can upload larger files by connecting third party add-ons to their OSF projects.

Fall Data and Visualization Workshops

2017 Data and Visualization Workshops

Visualize, manage, and map your data in our Fall 2017 Workshop Series.  Our workshops are designed for researchers who are new to data driven research as well as those looking to expand skills with new methods and tools. With workshops exploring data visualization, digital mapping, data management, R, and Stata, the series offers a wide range of different data tools and techniques. This fall, we are extending our partnership with the Graduate School and offering several workshops in our data management series for RCR credit (please see course descriptions for further details).

Everyone is welcome at Duke Libraries workshops.  We hope to see you this fall!

Workshop Series by Theme

Data Management

09-13-2017 – Data Management Fundamentals
09-18-2017 – Reproducibility: Data Management, Git, & RStudio 
09-26-2017 – Writing a Data Management Plan
10-03-2017 – Increasing Openness and Reproducibility in Quantitative Research
10-18-2017 – Finding a Home for Your Data: An Introduction to Archives & Repositories
10-24-2017 – Consent, Data Sharing, and Data Reuse 
11-07-2017 – Research Collaboration Strategies & Tools 
11-09-2017 – Tidy Data Visualization with Python

Data Visualization

09-12-2017 – Introduction to Effective Data Visualization 
09-14-2017 – Easy Interactive Charts and Maps with Tableau 
09-20-2017 – Data Visualization with Excel
09-25-2017 – Visualization in R using ggplot2 
09-29-2017 – Adobe Illustrator to Enhance Charts and Graphs
10-13-2017 – Visualizing Qualitative Data
10-17-2017 – Designing Infographics in PowerPoint
11-09-2017 – Tidy Data Visualization with Python

Digital Mapping

09-12-2017 – Intro to ArcGIS Desktop
09-27-2017 – Intro to QGIS 
10-02-2017 – Mapping with R 
10-16-2017 – Cloud Mapping Applications 
10-24-2017 – Intro to ArcGIS Pro

Python

11-09-2017 – Tidy Data Visualization with Python

R Workshops

09-11-2017 – Intro to R: Data Transformations, Analysis, and Data Structures  
09-18-2017 – Reproducibility: Data Management, Git, & RStudio 
09-25-2017 – Visualization in R using ggplot2 
10-02-2017 – Mapping with R 
10-17-2017 – Intro to R: Data Transformations, Analysis, and Data Structures
10-19-2017 – Developing Interactive Websites with R and Shiny 

Stata

09-20-2017 – Introduction to Stata
10-19-2017 – Introduction to Stata 

 

 

 

 

 

 

 

 

 

 

 

 

Love Your Data Week (Feb. 13-17)

In cooperation with the Triangle Research Library Network, Duke Libraries will be participating in Love Your Data Week on February 13-17, 2017. Love Your Data Week is an international event to help researchers take better care of their data. The campaign focuses on raising awareness and building community around data management, sharing, preservation, and reuse.

The theme for Love Your Data Week 2017 is data quality, with a related message for each day.

  • Monday: Defining Data Quality
  • Tuesday: Documenting, Describing, and Defining
  • Wednesday: Good Data Examples
  • Thursday: Finding the Right Data
  • Friday: Rescuing Unloved Data

Throughout the week, Data and Visualization Services will be contributing to the conversation on Twitter (@duke_data). We will also host the following local programming related to the daily themes:

In honor of Love Your Data Week chocolates will be provided at these workshops!

The new Research Data Management staff at the Duke Libraries are available to help researchers care for their data through consultations, support services, and instruction.  We can assist with writing data management plans that comply with funder policies, advise on data management best practices, and facilitate the ingest of data into repositories. To learn more about general data management best practices, see our newly updated RDM guide

Contact us at askdata@duke.edu to find out how we can help you love your data! 

Get involved in Love Your Data Week by following the conversation at #LYD17, #loveyourdata, and #trlndata.

All promotional Love Your Data 2017 materials used under a Creative Commons Attribution 4.0 International License.

Citation: Bass, M., Neeser, A., Atwood, T., and Coates, H. (2017). Love Your Data Week Promotional Materials. [image files]. Retrieved from https://osf.io/r8tht/files/