All posts by Sophia Lafferty-Hess

Dr. Mark Palmeri: An honest assessment of openness

This post is part of the Duke Research Data Curation Team’s ‘Researcher Highlight’ series.

In the field of engineering, a key driving motivator is theDr. Mark Palmeri urge to solve problems and provide tools to the community to address those problems. For Dr. Mark Palmeri, Professor in Biomedical Engineering at Duke University, open research practices support the ultimate goals of this work, and helps get the data into the hands of those solving problems: “It’s one thing to get a publication out there and see it get cited. It’s totally another thing to see people you have no direct professional connection to accessing the data and see it impacting something they’re doing…”

Dr. Palmeri’s research focuses on medical ultrasonic imaging, specifically using acoustic radiation force imaging to characterize the stiffness of tissues. His code and data allow other researchers to calibrate and validate processing protocols, and facilitate training of deep learning algorithms. He recently sat down with the Duke Research Data Repository Curation Team to discuss his thoughts on open science and data publishing.

“It’s one thing to get a publication out there and see it get cited. It’s totally another thing to see people you have no direct professional connection to accessing the data and see it impacting something they’re doing…”

With the new NIH data management and sharing policy on the horizon, many researchers are now considering what sharing data looks like for their own work. Palmeri highlighted some common challenges that many researchers will face, such as the inability to share proprietary data when working with industry partners, de-identifying data for public use (and who actually signs off on this process), the growing scope and scale of data in the digital age, and investing the necessary time to prepare data for public consumption. However, two of his biggest challenges relate to the changing pace of technology and the lack of data standards.

ClockWhen publishing a dataset, you necessarily have a static version of the dataset established in space and time via a persistent identifier (i.e., DOI); however, Palmeri’s code and software outputs are constantly evolving as are the underlying computational environments. This mismatch can result in datasets becoming out of sync with the coding tools, thereby affecting future reuse and ultimately keeping things up-to-date takes time and effort. As Palmeri notes, in the fast-paced culture of academia “no one has time to keep old project data up to snuff.”

Likewise, while certain types of data in medical imaging have standardized formats (e.g., DICOM), for the images Palmeri is creating from raw signal data there are no ubiquitous standards. This creates problems for data reuse. Palmeri remarks that “There’s no data model that exists to say what metadata should be provided, in what units, what major fields and subfields, so that becomes a major strain on the ability to meaningfully share the data, because if someone can’t open it up and know how to parse it and unwrap it and categorize it, you’re sharing gigabytes of bits that don’t really help anyone.” Currently, Dr. Palmeri is working with the Quantitative Imaging Biomarkers Alliance and the International Electrotechnical Commision (IEC) TC87 (Ultrasonics) WG9 (Shear Wave Elastography) to create a public standard for this technology for clinical use.

Ultrasound scanner images
Image processing example using MimickNet

Regardless of these challenges, Palmeri sees many benefits to publicly sharing data including enhancing “our internal rigor even just that little bit more” as well as opening “new doors of opportunity for new research questions…and then the scope and impact of the work can be augmented.” Dr. Palmeri appreciates the infrastructure provided by the Duke University Libraries to host his data in a centralized and distributed network as well as the ability to cite his data via the DOI. As he notes “you don’t want to just put up something on Box as those services can change year to year and don’t provide a really good preserved resource.” Beyond the infrastructure, he appreciates how the curation team provides “an objective third party [to] look at things and evaluate how shareable is this.”

“you don’t want to just put up something on Box as those services can change year to year and don’t provide a really good preserved resource.”

Within the Duke Research Data Repository, we have a mission to help Duke researchers make their data accessible to enable reproducibility and reuse. Working with researchers, like Dr. Palmeri, to realize a future where open research practices lead to a greater impact for researchers and democratizes knowledge is a core driving motivator. Contact us (datamanagement@duke.edu) with any questions you might have about starting your own data sharing adventure!

5 CDVS Online Learning Things

Within the Center for Data and Visualization Sciences (CDVS) we pride ourselves on providing numerous educational opportunities for the Duke community. Like many others during the COVID-19 pandemic, we have spent a large amount of time considering how to translate our in-person workshops to online learning experiences, explored the use of flipped classroom models, and learned together about the wonderful (and sometimes not so wonderful) features of common technology platforms (we are talking about you, Zoom).

Online learning setupWe also wanted to more easily surface the various online learning resources we have developed over the years via the web. Recognizing that learning takes place both synchronously and asynchronously, we have made available numerous guides, slide decks, example datasets, and both short-form and full-length workshops on our Online Learning Page. Below we highlight 5 online learning resources that we thought others interested in data driven research may wish to explore:

  • Mapping & GIS: R has become a popular and reproducible option for mapping and spatial analysis. Our Geospatial Data in R guide and workshop video introduce the use of the R language for producing maps. We cover the advantages of a code-driven approach such as R for visualizing geospatial data and demonstrate how to quickly and efficiently create a variety of map types for a website, presentation, or publication. 
  • Data Visualization: Visualization is a powerful way to reveal patterns in data, attract attention, and get your message across to an audience quickly and clearly. But, there are many steps in that journey from exploration to information to influence, and many choices to make when putting it all together to tell your story. In our Effective Data Visualization workshop, we cover some basic guidelines for effective visualization, point out a few common pitfalls to avoid, and run through a critique and iterations of an existing visualization to help you start seeing better choices beyond the program defaults.
  • Data ScienceQuickStart with R is our beginning data science module focusing on the Tidyverse — a data-first approach to data wrangling, analysis, and visualization.  Beyond introducing the Tidyverse approach to reproducible data workflows, we offer a rich allotment of other R learning resources at our Rfun site: workshop videos, case studies, shareable data, and code. Links to all our data science materials can also be found collated on our Online Learning page (above).
  • Data Management: Various stakeholders are stressing the importance of practices that make research more open, transparent, and reproducible including NIH who has released a new data management & sharing policy. In collaboration with the Office of Scientific Integrity, our Meeting Data Management Plan Requirements workshop presents details on the new NIH policy, describes what makes a strong plan, and where to find guidance, tools, resources, and assistance for building funder-based plans.
  • Data Sources: The U.S. Census has been collecting information on persons and businesses since the late 18th century, and tackling this huge volume of data can be daunting. Our guide to U.S. Census data highlights many useful places to view or download this data, with the Product Comparisons tab providing in chart form a quick overview of product contents and features. Other tabs provide more details about these dissemination products, as well as about sources for Economic Census data.

In the areas of data science, mapping & GIS, data visualization, and data management, we cover many other topics and tools including ArcGIS, QGIS, Tableau, Python for tabular data and visualization, Adobe Illustrator, MS PowerPoint, effective academic posters, reproducibility, ethics of data management and sharing, and publishing research data. Access more resources and past recordings on our online learning page or go to our upcoming workshops list to register for a synchronous learning opportunity.

Change is coming – are you open to it?

This blog post is a collaboration between Paolo Mangiafico from ScholarWorks and Sophia Lafferty-Hess from the Center for Data and Visualization Sciences and the Duke Research Data Repository.

Open access journals have been around forOpen sign several decades, and almost all researchers have read them or published in them by now. Perhaps less well known are trends toward more openness in sharing of data, methods, code, and other aspects of research – broadly called open scholarship. There are lots of good reasons to make your research outputs as open as possible, and increasing support at Duke for doing it.

There are many different variants of “open” – including goals of making research accessible to all, making data and methods transparent to increase reproducibility and trust, licensing research to enable broad re-use, and engagement with a variety of stakeholders, among other things. All of these provide benefits to the public and they also provide benefits to Duke researchers. There’s growing evidence that openly available publications and data result in more citations and greater impact (Colavizza 2020), and showing one’s work and making it available for replication helps build greater trust. There’s greater potential economic impact when others can build on research more quickly, and more avenues for collaboration and interdisciplinary engagement.

Recognizing the importance of making research outputs quickly and openly available to other researchers and the public, and supporting greater transparency in research, many funding agencies are now encouraging or requiring it. NIH has had a public access policy for over a decade, and NSF and other agencies have followed with similar policies. NIH has also released a new Data Management and Sharing policy that goes into effect in 2023 with more robust and clearer expectations for how to effectively share data. In Europe, government research funders back a program called Plan S, and in the United States, the recently passed U.S. Innovation and Competition Act (S. 1260) includes provisions that instruct federal agencies to provide free online public access to federally-funded research “not later than 12 months after publication in peer-reviewed journals, preferably sooner.”

The USICA bill aims to maximize the impact of federally-funded research by ensuring that final author manuscripts reporting on taxpayer-funded research are:

  • Deposited into federally designated or maintained repositories;
  • Made available in open and machine-readable formats; 
  • Made available under licenses that enable productive reuse and computational analysis; and
  • Housed in repositories that ensure interoperability and long-term preservation.

Duke got a head start on supporting researchers in making their publications open access in 2010, when Academic Council adopted an open access policy, which since then has been part of the Faculty Handbook (Appendix P). The policy provides the legal basis for Duke faculty to make their own research articles openly available on a personal or institutional website via a non-exclusive license, while also making it possible to comply with any requirements imposed by their journal or funder. Shortly after the policy was adopted, Duke Libraries worked with the Provost’s office to implement a service making open access easy for Duke researchers. DukeSpace, a repository integrated with the Scholars@Duke profile system, allows you to add a publication to your profile and deposit it to Duke’s open access archive in a single step, and have the open access link included in your citations alongside the link to the published version.

Duke Libraries also support a research data repository and services to help the Duke community organize, describe, and archive their research data for open access. This service, with support from the Provost’s office, provides both the infrastructure and curation staff to help Duke researchers make their data FAIR (Findable, Accessible, Interoperable, and Reusable). By publishing datasets with digital object identifiers (DOIs) and data citations, we create a value chain where making data available increases their impact and positions them as standalone research objects. The importance of data sharing specifically is also being formalized at Duke through the current Research Data Policy Initiative, which has a stated mission to “facilitate efficient and quality research, ensure data quality, and foster a culture of data sharing.” Together the Duke community is working to develop services, processes, procedures, and policies that broaden our contributions to society through public access to the outputs of our research.

Are you ready to make your work open? You can find more information about how to deposit your publications and data for open access at Duke on the ScholarWorks website, and consultants from Duke Libraries’ ScholarWorks Center for Scholarly Publishing and Center for Data and Visualization Sciences are available to help you find the best place to make your work open access, choose an appropriate license, and track how it’s being used.

Share More Data in the Duke Research Data Repository!

We are happy to announce expanded features for the public sharing of large scale data in the Duke Research Data Repository! The importance of open science for the public good is more relevant than ever and scientific research is increasingly happening at scale. Relatedly, journals and funding agencies are requiring researchers to share the data produced during the course of their research (for instance see the newly released NIH Data Management and Sharing Policy). In response to this growing and evolving data sharing landscape, the Duke Research Data Repository team has partnered with Research Computing and OIT to integrate the Globus file transfer system to streamline the public sharing of large scale data generated at Duke. The new RDR features include:

  • A streamlined workflow for depositing large scale data to the repository
  • An integrated process for downloading large scale data (datasets over 2GB) from the repository
  • New options for exporting smaller datasets directly through your browser
  • New support for describing and using collections to highlight groups of datasets generated by a project or group (see this example)
  • Additional free storage (up to 100 GB per deposit) to the Duke community during 2021!

While using Globus for both upload and download requires a few configuration steps by end users, we have strived to simplify this process with new user documentation and video walk-throughs. This is the perfect time to share those large(r) datasets (although smaller datasets are also welcome!).

Contact us today with questions or get started with a deposit!

Publish Your Data: Researcher Highlight

This post was authored by Shadae Gatlin, DUL Repository Services Analyst and member of the Research Data Curation Team.

Collaborating for openness

The Duke University Libraries’ Research Data Curation team has the privilege to collaborate with exceptional researchers and scholars who are advancing their fields through open data sharing in the Duke Research Data Repository (RDR). One such researcher, Martin Fischer, Ph.D., Associate Research Professor in the Departments of Chemistry and Physics, recently discussed his thoughts on open data sharing with us. A trained physicist, Dr. Fischer describes himself as an “optics person” his work ranges from developing microscopes that can examine melanin in tissues to looking at pigment distribution in artwork. He has published data in the RDR on more than one occasion and says of the data deposit process that, “I can only say, it was a breeze.”

“I can only say, it was a breeze.”

Dr. Fischer recalls his first time working with the team as being “much easier than I thought it was going to be.” When Dr. Fischer and colleagues experienced obstacles trying to setup OMERO, a server to host their project data, they turned to the Duke Research Data Repository as a possible solution to storing the data. This was Dr. Fischer’s first foray into open data publishing, and he characterizes the team as being  responsive and easy to work with. Due to the large size of the data, the team even offered to pick up the hard drive from Fischer’s office. After they acquired the data, the team curated, archived, and then published it, resulting in Fischer’s first dataset in the RDR.

Why share data?

When asked why he believes open data sharing is important, Dr. Fischer says that “sharing data creates an opportunity for others to help develop things with you.” For example, after sharing his latest dataset  which evaluates the efficacy of masks to reduce the transmission of respiratory droplets, Fischer received requests for a non-proprietary option for data analysis instead of using the team’s data analysis scripts written for the commercial program Mathematica. Peers offered to help develop a Python script, which is now openly available, and for which the developers used the RDR data as a reference. As of January 2021, the dataset has had 991 page views.

Dr. Fischer appreciates the opportunity for research development that open data sharing creates, saying, “Maybe somebody else will develop a routine, or develop something that is better, easier than what we have”. Datasets deposited in the RDR are made publicly available for download and receive a permanent DOI link, which makes the data even more accessible.

“Maybe somebody else will develop a routine, or develop something that is better, easier than what we have.”

In addition to the benefits of long-term preservation and access that publishing data in the RDR provides, Dr. Fischer finds that sharing his data openly encourages a sense of accountability. “I don’t have a problem with other people going in and trying, and making sure it’s actually right. I welcome the opportunity for feedback”. With many research funding agencies introducing policies for research data management and data sharing practices, the RDR is a great option for Duke researchers. Every dataset that is accepted into the RDR is carefully curated to meet FAIR guidelines and optimized for future reuse.

Collaborating with researchers like Dr. Martin Fischer is one of the highlights of working on the Research Data Curation team. We look forward to seeing what fascinating data 2021 will bring to the RDR and working with more Duke researchers to share their data with the world.

Dr. Fischer’s Work in the Duke Research Data Repository:

  • Wilson, J. W., Degan, S., Gainey, C. S., Mitropoulos, T., Simpson, M. J., Zhang, J. Y., & Warren, W. S. (2019). Data from: In vivo pump-probe and multiphoton fluorescence microscopy of melanoma and pigmented lesions in a mouse model. Duke Digital Repository. https://doi.org/10.7924/r4cc0zp95
  • Fischer, E., Fischer, M., Grass, D., Henrion, I., Warren, W., Westman, E. (2020). Video data files from: Low-cost measurement of facemask efficacy for filtering expelled droplets during speech. Duke Research Data Repository. V2 https://doi.org/10.7924/r4ww7dx6q

Got Data? Data Publishing Services at Duke Continue During COVID-19

While the library may be physically closed, the Duke Research Data Repository (RDR) is open and accepting data deposits. If you have a data sharing requirement you need to meet for a journal publisher or funding agency we’ve got you covered. If you have COVID-19 data that can be openly shared, we can help make these vital research materials available to the public and the research community today. Or if you have data that needs to be under access restrictions, we can connect you to partner disciplinary repositories that support clinical trials data, social science data, or qualitative data.

Speaking of the RDR, we just completed a refresh on the platform and added several features!

In-line with data sharing standards, we also assign a digital object identifier (DOI) to all datasets, provide structured metadata for discovery, curate data to further enhance datasets for reuse and reproducibility, provide safe archival storage, and a standardized citation for proper acknowledgement.

Openness supports the acceleration of science and the generation of knowledge. Within the libraries we look forward to partnering with Duke researchers to disseminate their research data! Visit https://research.repository.duke.edu/ to learn more or contact datamanagement@duke.edu with any questions.

Duke University Libraries Partners with the Qualitative Data Repository

Duke University Libraries has partnered with the Qualitative Data Repository (QDR) as an institutional member to provide qualitative data sharing, curation, and preservation services to the Duke community. QDR is located at Syracuse University and has staff and infrastructure in place to specifically address some of the unique needs of qualitative data including curating data for future reuse, providing mediated access, and assisting with Data Use Agreements.

Duke University Libraries has long been committed to helping our scholars make their research openly accessible and stewarding these materials for the future. Over the past few years, this has included launching a new data repository and curation program, which accepts data from any discipline as well as joining the Data Curation Network. Now through our partnership with QDR we can further enhance our support for sharing and archiving qualitative data.

Qualitative data come in a variety of forms including interviews, focus groups, archival materials, textual documents, observational data, and some surveys. QDR can help Duke researchers have a broader impact through making these unique data more widely accessible.

“Founded and directed by qualitative researchers, QDR is dedicated to helping researchers share their qualitative data,” says Sebastian Karcher, QDR’s associate director. “Informed by our deep understanding of qualitative research, we help researchers share their data in ways that reflect both their ethical commitments and do justice to the richness and diversity of qualitative research. We couldn’t be more excited to continue our already fruitful partnership with Duke University Libraries”

Through this partnership, Duke University Libraries will have representation on the governance board of QDR and be involved in the latest developments in managing and sharing qualitative data. The libraries will also be partnering with QDR to provide virtual workshops in the spring semester at Duke to enhance understanding around the sharing and management of qualitative research data.

If you are interested in learning more about this partnership, contact datamanagement@duke.edu.

OSF@Duke: By the Numbers and Beyond

The Open Science Framework (OSF) is a data and project management platform developed by the Center for Open Science that is designed to support the entire research lifecycle. OSF has a variety of features including file management and versioning, integration with third-party tools, granular permissions and sharing capabilities, and communication functionalities. It also supports growing scholarly communication formats including preprints and preregistrations, which enable more open and reproducible research practices.

In early 2017, Duke University became a partner institution with the OSF. As a partner institution, Duke researchers can sign into the OSF using their NetID and affiliate a project with Duke, which allows it to be displayed on the Duke OSF page. After 2 years of supporting OSF for Institutions here at Duke, the Research Data Management (RDM) team wanted to gain a better perspective surrounding how our community was using the tool and their perceptions. 

As of March 10, 2019, Duke has 202 users that have signed into the system using their Duke credentials (and there are possibly more users that are authenticating using personal email accounts). Of these users, 177 total projects have been created and affiliated with Duke. Forty-six of these projects are public and 132 remain private. Duke users have also registered 80 Duke affiliated projects, 62 of which are public and 18 are embargoed. A registration is a time-stamped read-only copy of an OSF project that can be used to preregister a research design, to create registered reports for journals, or at the conclusion of a project to formally record the authoritative copy of materials.

But what do OSF users think of the tool and how are they using it within their workflows? A few power users shared their thoughts:

Optimizing research workflows: A number of researchers noted how the OSF has helped streamline their workflows through creating a “central place that everyone has access to.” OSF has helped “keeping track of the ‘right’ version of things” and “bypassing the situation of having different versioned documents in different places.” Additionally, the OSF has supported “documenting workflow pipelines.”

Facilitating collaboration: One of the key features of the OSF is that researchers, regardless of institutional affiliation, can contribute to a project and integrate the tools they already use. Matt Makel, Director of Research at TIP, explains how OSF supports his research – “I collaborate with many colleagues at other institutions. OSF solves the problem of negotiating which tools to use to share documents. Rather than switching platforms across (or worse, within) projects, OSF is a great hub for our productivity.”

Offering an end-to-end data management solution: Some research groups are also using OSF in multiple stages of their projects and for multiple purposes. As one researcher expressed – “My research group uses OSF for every project. That includes preregistration and archiving research materials, data, data management and analysis syntax, and supplemental materials associated with publications. We also use it to post preprints to PsyArXiv.”

It also surfaced that OSF supported an ideological perception regarding a shift in the norms of scholarly communication. As Elika Bergelson, Crandall Family Assistant Professor in Psychology and Neuroscience, aptly put it “Open science is the way of the future.” Here within Duke University Libraries, we aim to continue to support these shifting norms and the growing benefits of openness through services, platforms, and training.

To learn more about how the OSF might support your research, join us on April 3 from 10-11 am for hands-on OSF workshop. Register here: https://duke.libcal.com/event/4803444

If you have other questions about using the OSF in a project, the RDM team is available for consultations or targeted demonstrations or trainings for research teams. We also have an OSF project that can help you understand the basic features of the tool.

Contact askdata@duke.edu to learn more or request an OSF demonstration.

Computational Reproducibility Pilot – Code Ocean Trial

A goal of Duke University Libraries (DUL) Code Ocean Logois to support the  growing and changing needs of the Duke research community. This can take many forms. Within Data and Visualization Services, we provide learning opportunities, consulting services, and computational resources to help Duke researchers implement their data-driven research projects. Monitoring and assessing new tools and platforms also helps DUL stay in tune with changing research norms and practices. Today the increasing focus on the importance of transparency and reproducibility has resulted in the development of new tools  and resources to help researchers produce and share more reproducible results. One such tool is Code Ocean.

Code Ocean is a computational reproducibility platform that employs Docker technology to execute code in the cloud. The platform does two key things—it integrates the metadata, code, data and dependencies into a single ‘compute capsule’, ensuring that the code will run—and it does this in a single web interface that displays all inputs and results. Within the platform, it is possible to develop, edit or download the code, run routines, and visualize, save or download output, all from a personal computer. Users or reviewers can upload their own data and test the effects of changing parameters or modification of the code. Users can also share their data and code through the platform. Code Ocean provides a DOI for all capsules facilitating attribution and a permanent connection to any published work.

In order to help us understand and evaluate the usefulness of the Code Ocean platform to the Duke research community, DUL will be offering trial access to the Code Ocean cloud-based computational reproducibility platform starting on October 1, 2018. To learn more about what is included in the trial access and to sign up to participate, visit the Code Ocean pilot portal page.

If you have any questions, contact askdata@duke.edu.

Highlights from Expanding our Research Data Management Program

Since the launch of our expanded research data management (RDM) program in January, the Research Data Management Team in DVS has been busy defining and implementing our suite of services. Our “Lifecycle Services” are designed to assist scholars at all stages of their research project from the planning phase to the final curation and disposition of their data in an archive or repository. Our service model centers on four key areas: data management planning, data workflow design, data and documentation review, and data repository support. Over the past nine months, we have  worked with Duke researchers across disciplines to provide these services, allowing us to see their value in action. Below we present some examples of how we have supported researchers within our four support areas.

Data Management Planning

With increasing data management plan requirements Data Management Planningas well as growing  expectations that funding agencies will more strictly enforce and evaluate these plans, researchers are seeking assistance ensuring their plans comply with funder requirements. Through in-person consultations and online review through the DMPTool, we have helped researchers enhance their DMPs for a variety of funding agencies including the NSF Sociology Directorate, the Department of Energy, and the NSF Computer & Information Science & Engineering (CISE) Program.

Data Workflow Design

As research teams begin a project there are a variety Data Workflow Designof organizational and workflow decisions that need to be made from selecting appropriate tools to implementing storage and backup strategies (to name a few). Over the past 6 months, we have had the opportunity to help a multi-institutional Duke Marine Lab Behavioral Response Study (BRS) implement their project workflow using the Open Science Framework (OSF). We have worked with project staff to think through the organization of materials, provided training on the use of the tool, and strategized on storage and backup options.

Data and Documentation Review

During a project, researchers make decisions about how to format, Data and Documentation Reviewdescribe, and structure their data for sharing and preservation. Questions may also arise surrounding how to ethically share human subjects data and navigate intellectual property or copyright issues. In conversations with researchers, we have provided suggestions for what formats are best for portability and preservation, discussed their documentation and metadata plans, and helped resolve intellectual property questions for secondary data.

Data Repository Support

At the end of a project, researchers may be required Data Repository Supportor choose to deposit their data in an archive or repository. We have advised faculty and students on repository options based on their discipline, data type, and repository features. One option available to the Duke community is the Duke Digital Repository. Over the past nine months, we have assisted with the curation of a variety of datasets deposited within the DDR, many of which underlie journal publications.

This year Duke news articles have featured two research studies with datasets archived within the DDR, one describing a new cervical cancer screening device and another presenting cutting-edge research on a potential new state of matter. The accessibility of both Asiedu et al.’s screening device data and Charbonneau and Yaida’s glass study data enhances the overall transparency and reproducibility of these studies.

Our experiences thus far have enabled us to better understand the diversity of researchers’ needs and allowed us to continue to hone and expand our knowledge base of data management best practices, tools, and resources. We are excited to continue to work with and learn from researchers here at Duke!