Category Archives: Data Curation

Open Science Framework @ Duke

Center for Open ScienceThe Open Science Framework (OSF) is a free, open source project management tool developed and maintained by the Center for Open Science (COS). OSF offers many features that can help scholars manage their workflow and outputs throughout the research lifecycle. From collaborating effectively, to managing data, code, and protocols in a centralized location, to sharing project materials with the broader research community, the OSF provides tools that support openness, research integrity, and reproducibility. Some of the key functionalities of the OSF include:

  • Integrations with third-party tools that researchers already use (i.e., Box, Google Drive, GitHub, Mendeley, etc.)
  • Hierarchical organizational structures
  • Unlimited native OSF storage*
  • Built-in version control
  • Granular privacy and permission controls
  • Activity log that tracks all project changes
  • Built-in collaborative wiki and commenting pane
  • Analytics for public projects
  • Persistent, citable identifiers for projects, components, and files along with Digital Object Identifiers (DOIs) and Archival Resource Keys (ARKs) available for public OSF projects
  • And more!

Duke University is a partner institution with OSF, meaning  you can sign into the OSF using your NetID and affiliate your projects with Duke. Visit the Duke OSF page to see some Duke research projects and outputs from our community.

Duke University Libraries has also partnered with COS to host a workshop this fall entitled “Increasing Openness and Reproducibility in Quantitative Research.” This workshop will teach participants how they can increase the reproducibility of their work and will include hands-on exercises using the OSF.

Workshop Details
Date: October 3, 2017
Time: 9 am to 12 pm

If you are interested in affiliating an existing OSF project, want to learn more about how the OSF can support your workflow, or would like a demonstration of the OSF, please contact

*Individual file size limit of 5 GB. Users can upload larger files by connecting third party add-ons to their OSF projects.

Fall Data and Visualization Workshops

2017 Data and Visualization Workshops

Visualize, manage, and map your data in our Fall 2017 Workshop Series.  Our workshops are designed for researchers who are new to data driven research as well as those looking to expand skills with new methods and tools. With workshops exploring data visualization, digital mapping, data management, R, and Stata, the series offers a wide range of different data tools and techniques. This fall, we are extending our partnership with the Graduate School and offering several workshops in our data management series for RCR credit (please see course descriptions for further details).

Everyone is welcome at Duke Libraries workshops.  We hope to see you this fall!

Workshop Series by Theme

Data Management

09-13-2017 – Data Management Fundamentals
09-18-2017 – Reproducibility: Data Management, Git, & RStudio 
09-26-2017 – Writing a Data Management Plan
10-03-2017 – Increasing Openness and Reproducibility in Quantitative Research
10-18-2017 – Finding a Home for Your Data: An Introduction to Archives & Repositories
10-24-2017 – Consent, Data Sharing, and Data Reuse 
11-07-2017 – Research Collaboration Strategies & Tools 
11-09-2017 – Tidy Data Visualization with Python

Data Visualization

09-12-2017 – Introduction to Effective Data Visualization 
09-14-2017 – Easy Interactive Charts and Maps with Tableau 
09-20-2017 – Data Visualization with Excel
09-25-2017 – Visualization in R using ggplot2 
09-29-2017 – Adobe Illustrator to Enhance Charts and Graphs
10-13-2017 – Visualizing Qualitative Data
10-17-2017 – Designing Infographics in PowerPoint
11-09-2017 – Tidy Data Visualization with Python

Digital Mapping

09-12-2017 – Intro to ArcGIS Desktop
09-27-2017 – Intro to QGIS 
10-02-2017 – Mapping with R 
10-16-2017 – Cloud Mapping Applications 
10-24-2017 – Intro to ArcGIS Pro


11-09-2017 – Tidy Data Visualization with Python

R Workshops

09-11-2017 – Intro to R: Data Transformations, Analysis, and Data Structures  
09-18-2017 – Reproducibility: Data Management, Git, & RStudio 
09-25-2017 – Visualization in R using ggplot2 
10-02-2017 – Mapping with R 
10-17-2017 – Intro to R: Data Transformations, Analysis, and Data Structures
10-19-2017 – Developing Interactive Websites with R and Shiny 


09-20-2017 – Introduction to Stata
10-19-2017 – Introduction to Stata 













Love Your Data Week (Feb. 13-17)

In cooperation with the Triangle Research Library Network, Duke Libraries will be participating in Love Your Data Week on February 13-17, 2017. Love Your Data Week is an international event to help researchers take better care of their data. The campaign focuses on raising awareness and building community around data management, sharing, preservation, and reuse.

The theme for Love Your Data Week 2017 is data quality, with a related message for each day.

  • Monday: Defining Data Quality
  • Tuesday: Documenting, Describing, and Defining
  • Wednesday: Good Data Examples
  • Thursday: Finding the Right Data
  • Friday: Rescuing Unloved Data

Throughout the week, Data and Visualization Services will be contributing to the conversation on Twitter (@duke_data). We will also host the following local programming related to the daily themes:

In honor of Love Your Data Week chocolates will be provided at these workshops!

The new Research Data Management staff at the Duke Libraries are available to help researchers care for their data through consultations, support services, and instruction.  We can assist with writing data management plans that comply with funder policies, advise on data management best practices, and facilitate the ingest of data into repositories. To learn more about general data management best practices, see our newly updated RDM guide

Contact us at to find out how we can help you love your data! 

Get involved in Love Your Data Week by following the conversation at #LYD17, #loveyourdata, and #trlndata.

All promotional Love Your Data 2017 materials used under a Creative Commons Attribution 4.0 International License.

Citation: Bass, M., Neeser, A., Atwood, T., and Coates, H. (2017). Love Your Data Week Promotional Materials. [image files]. Retrieved from

New Data Management Services @ Duke

Data ManagementDuke Libraries are happy to announce a new set of research data management services designed to help researchers secure grant funding, increase research impact, and preserve valuable data. Building on the recommendations of the Digital Research Faculty Working Group and the Duke Digital Research Data Services and Support report, Data and Visualization Services have added two new research data management consultants who are available to work with researchers across the university and medical center on a broad range of data management concerns from data creation to data curation.

Interested in learning more about data management?

Our New Data Management Consultants

sophialh2Sophia Lafferty-Hess attended the University of North Carolina at Chapel Hill where she received a Master of Science in Information Science and Master of Public Administration. Prior to coming to Duke, Sophia worked at the Odum Institute for Research in Social Science at UNC-Chapel Hill within the Data Archive as a Research Data Manager. In this position, Sophia provided consultations to researchers on data management best practices, curated research data to support long-term preservation and reuse, and provided training and instruction on data management policies, strategies, and tools.

While at Odum, Sophia also helped lead the development of a data curation and verification service for journals to help enforce data sharing and replication policies, which included verifying that data meet quality standards for reuse and that the data and code can properly reproduce the analytic results presented in the article. Sophia’s current research interests include the impact of journal data sharing policies on data availability and the development of data curation workflows.

jen2Jen Darragh comes to us from Johns Hopkins University where she served for the past seven years as the Data Services and Sociology Librarian, and Hopkins Population Center Restricted Projects Coordinator.  In this position, Jen  developed the libraries’ Restricted Data Room and designed the secure data enclave spaces and staff support for the Johns Hopkins Population Center.

Jen received her Bachelor of Arts Degree in Psychology from Westminster College (PA) and her Master of Library and Information Sciences degree from the University of Pittsburgh.  She has been involved with socio-behavioral research data throughout her career.  Jen is particularly interested in the development of centralized, controlled data access for sensitive human subjects’ data (subject to HIPAA or FERPA requirements) to facilitate broader, yet more secure sharing of existing research data as a means to produce new, cutting-edge research.


Duke Libraries and SSRI welcome Mara Sedlins!

On behalf of Duke Libraries and the Social Science Research Institute, I am happy to welcome Mara Sedlins to Duke.  As the library and SSRI work to develop a rich set of data management, analysis, and archiving strategies for Duke researchers, Mara’s postdoctoral position provides a unique opportunity to work closely with researchers across campus to improve both training and workflows for data curation at Duke.  – Joel Herndon, Head of Data and Visualization Services, Duke Libraries  

2016-08-25 11.06.17 HDRI am excited to join the Data and Visualization Services team this fall as a postdoctoral fellow in data curation for the social sciences (sponsored by CLIR and funded by the Alfred P. Sloan Foundation). For the next two years, I will be working with Duke Libraries and the Social Science Research Institute to develop best practices for managing a variety of research data in the social sciences.

My research background is in social and personality psychology. I received my PhD at the University of Washington, where I worked to develop and validate a new measure of automatic social categorization – to what extent do people, automatically and without conscious awareness, sort faces into socially constructed categories like gender and race? The measure has been used in studies examining beliefs about human genetic variation and the racial labels people assign to multiracial celebrities like President Barack Obama.

While in Seattle, I was also involved in several projects at Microsoft Research assessing computer-supported cooperative work technologies, focusing on people’s preferences for different types of avatar representations, compared to video or audio-only conferencing. I also have experience working with data from a study of risk factors for intimate partner violence, managing a database of donors and volunteers for a historical archive, and organizing thousands of high-resolution images for a large-scale digital comic art restoration project.

I look forward to applying the insights gained from working on a diverse array of data-intensive projects to the problem of developing and promoting best practices for data management throughout the research lifecycle.  I am particularly interested in questions such as:

  • How can researchers write actionable data management plans that improve the quality of their research?
  • What strategies can be used to organize and document data files during a project so that it’s easy to find and understand them later?
  • What steps need to be taken so that data can be discovered and re-used effectively by other researchers?

These are just a few of the questions that are central to the rapidly evolving field of data curation for the sciences and beyond.


New Year- New Data and Visualization Lab!

Data and Visualization Services is happy to announce our new Data and Visualization Lab in Duke Libraries new Edge research space.  Located on the first floor of the Bostock Library, the Brandaleone Family Lab for Data and Visualization Services offers a dedicated space for researchers working on data driven projects.

The lab features three distinct areas for supporting data driven research.

Data and Visualization Lab Space

Data and Visualization Lab Computing Zone

Our lab space features twelve high end workstations with dual monitors with the latest software for data visualization, digital mapping, statistics, and qualitative research.  All of the machines have two dedicated displays to encourage collaborative work and data consultations.  Additionally, all twelve machines have a dedicated power port located conveniently under the edge of the table for powering a laptop or usb powered device.

Bloomberg Professional “Bar”


Since the launch of our Bloomberg terminals, we have seen a steady increase in both individual and team based usage of Bloomberg financial data.  Our three Bloomberg Professional workstations are now located on a dedicated “bar” across from our lab machines.  The  new Bloomberg zone will facilitate collaborate work and provide a base for groups such as the Duke University Investment Club and Duke Financial Economics Center.

Consult and Collaborative SpaceCollaboration Zone

Our third lab space provides a set of four rolling tables for small groups to collaborate or for projects that don’t require a fixed computing space.   An 85″ flat panel display near this zone features data visualizations and other data driven research projects at Duke.

Come See Us!

With ample natural light,  almost 24/7 availability, and a welcoming staff eager to work with you on your next data driven project.  We look forward to working with you in the upcoming year!

Meet Data and Visualization Services

Data and Visualization Services LogoThe fall of 2014 marks the completion of the first five years of the libraries’ Data and GIS Services Department. In 2009, when Mark Thomas and I formed the department, the name accurately reflected our staffing and services as Mark focused on GIS-related issues and I focused on data-related issues. As an increasing number of scholars have embraced data-driven research over the last five years , our services and staff have grown to support an increasingly diverse set of research needs at Duke.

In 2010-2011 academic year, the Libraries launched services around data management and sharing plans in anticipation of new funding rules surrounding research data. In 2012, the library expanded data services in collaboration with OIT’s Research Computing to offer one of the first data visualization consulting positions in the country. In 2013 and 2014, we expanded services and staff to include consultations on research computing and big data.

At this year’s Data and GIS Services annual retreat, we decided that the time has come to change the name of the department to reflect the broader range of staff and consulting services available. While we continue to support our traditional dimensions of data and GIS research, we intend to support a range of data needs across the following five themes:

Data and Visualization Services Themes

Data Sources
Get the data you need. Data and Visualization Services consultants can help you locate and license a diverse range of data sources.  We also provide long term storage for Duke data collections through Duke’s institutional repository.

Data Storage and Management
Need help on a data management plan, want advice on archiving, or struggling with “big data” analytics?  We are happy to consult!

Data Cleaning and Analysis
From Google Refine to the command line, we can help with data cleaning and analysis.

Mapping and GIS
Mapping and spatial analysis remain a core service for the data and visualization program.

Data Visualization
Our data visualization service can help with the most effective way to represent your data for both analysis and communication.


We appreciate the research community’s support as we’ve grown over the last five years.  We look forward to working with you on a larger range of data challenges in the future!

Top 10 List – Data and GIS Edition

As we begin our summer in Data and GIS Services, we spend this post reflecting back on some of the services, software, and tools that made data work this spring more productive and more visible.  We proudly present our top 10 list for the Spring 2014 semster:

10. DMPTool
While we enjoy working directly with researchers crafting data management plans, we realize that some data management needs arise outside of consultation hours.  Fortunately, the Data Management Planning Tool (DMPTool) is there 24/7 to provide targeted guidance on data management plans for a range of granting agencies.

9. Fusion Tables
A database in the cloud that allows you to query and visualize your data, Fusion Tables has proven a powerful tool for researchers who need database functionality but don’t have time for a full featured database.  We’ve worked with many groups to map their data in the cloud; see the Digital Projects blog for an example.  Fusion Tables is a regular workshop in Data and GIS.

8. Open Refine
You could learn the UNIX command line and a scripting language to clean your data, but Open Refine opens data cleaning to a wider audience that is more concerned with simplicity than syntax.  Open Refine is also a regular workshop in Data and GIS.

7. R and RStudio
A programming language that excels at statistics and data visualization, R offers a powerful, open source solution to running statistics and visualizing complex data.  RStudio provides a clean, full-featured development environment for R that greatly enhances the analysis process.

6. Tableau Public
Need a quick, interactive data visualization that you can share with a wide audience?  Tableau Public excels at producing dynamic data visualizations from a range of different datasets and provides intuitive controls for letting your audience explore the data.

5. ArcOnline
ArcGIS has long been a core piece of software for researchers working with digital maps.  ArcOnline extends the rich mapping features of ArcGIS into the cloud, allowing a wider audience to share and build mapping projects.

4. Pandas
A Python library that brings data analysis and modeling to the Python scripting language, Pandas brings the ease and power of Python to a range of data management and analysis challenges.

3. RAW
Paste in your spreadsheet data, choose a layout, drag and drop your variables… and your visualization is ready.  Raw makes it easy to go from data to visualization using an intuitive, minimal interface.

2. Stata 13
Another core piece of software in the Data and GIS Lab (and at Duke), Stata 13 brought new features and flexibility (automatic memory management — “hello big data”) that were greatly appreciated by Duke researchers.

1. R Markdown
While many librarians tell people to “document your work,” R Markdown makes it easy to document your research data, explain results, and embed your data visualizations using a minimal markup language that works in any text editor and ties nicely into the R programming language.   For pulling it all together, R Markdown is number one in our top ten list!

We hope you’ve enjoyed the list!  If you are interested in these or other data tools and techniques, please contact us at!

Scaling Support: Designing Data for a Growing Statistics Program

r_stats101How do you support 57,860 online students learning R and statistics ?  Late last fall, Data and GIS Services shared this challenge with Professor Mine Çetinkaya-Rundel and the staff of CIT as we sought to translate Professor Çetinkaya-Rundel’s successful Statistics 101 course to a Coursera class on Data Analysis and Statistical Inference.  While Data and GIS Services has supported Statistics 101 students for several years identifying appropriate data and using the R statistical language for their assignments, the scale of the Coursera course introduced new challenges of trying to provide engaging data to a very large audience without having the opportunity to provide direct support to everyone in the class.

In our initial meetings with Professor Çetinkaya-Rundel, she requested that Data and GIS create data collections for the course that would provide easy access in R and would include a range of statistical measures that would appeal to the diverse audience in the class.  The first challenge — easy access to R — required some translation work.  While R excels in its flexibility, graphics, and statistical power, it lacks some of the built in data documentation features present in other statistical packages.  This project prompted Data and GIS to reconsider how to provide documentation and pre-formatted R data to an audience that would likely be unfamiliar with R and data documentation.

The second challenge — finding data that covered a wide range of interesting topics — proved much easier.  The General Social Survey with its diverse and engaging questions on a wide range of topics proved to be an easy choice for the class.  The American National Election Studies, also offered a diverse set of measures of public opinion that suited the course well.  With these challenges identified and addressed, we spent the end of 2013 selecting portions of the data for class (subsetting), abridging the data documentation for instructional use, and transforming the data to address its usage in an online setting (processing missing values for R, creating factor variables).

As Professor Çetinkaya-Rundel’s class launches on February 17th, this project has given us a new appreciation of providing data and statistical services in a MOOC while also building course materials that we are using in Statistics 101 at Duke.  While students begin the Coursera course on Data Analysis and Statistical Inference, students in Professor Kari Lock Morgan’s Statistics 101 class will use these data in their on-campus Duke course as well.  We hope that both collections will reduce some of the technological hurdles that often confront courses using R as well as improving statistical literacy at Duke and beyond.

Data and GIS Services Spring 2014 Workshop Series

DGSwkshpExplore network analysis, text mining, online mapping, data visualization, and statistics in our spring 2014 workshop series.  Our workshops provide a chance to explore new tools or refresh your memory on effective strategies for managing digital research.  Interested in keeping up to date with workshops and events in Data and GIS?  Subscribe to the dgs-announce listserv or follow us on Twitter (@duke_data).

Currently Scheduled Workshops

 Thu, Jan 9 2:00 PM – 3:30 PM  Data Management Plans – Grants, Strategies, and Considerations

 Mon, Jan 13 2:00 PM – 3:30 PM Webinar: Social Science Data Management and Curation

 Mon, Jan 13 3:00 PM – 4:00 PM Google Fusion Tables

 Tue, Jan 14 3:00 PM – 4:00 PM Open (aka Google) Refine 

 Wed, Jan 15 1:00 PM – 3:00 PM Stata for Research

 Thu, Jan 16 3:00 PM – 5:00 PM Analysis with R

 Tue, Jan 21 1:00 PM – 3:00 PM Introduction to ArcGIS

 Wed, Jan 22 1:00 PM – 3:00 PM ArcGIS Online

 Wed, Jan 22 3:00 PM – 4:00 PM Open (aka Google) Refine 

 Mon, Jan 27 2:00 PM – 3:30 PM Introduction to Text Analysis

 Wed, Jan 29 1:00 PM – 3:00 PM Analysis with R

 Thu, Jan 30 2:00 PM – 4:00 PM Stata for Research

 Mon, Feb 3 1:00 PM – 2:00 PM  Data Visualization on the Web

 Mon, Feb 3 2:00 PM – 3:00 PM  Data Visualization on the Web (Advanced)

 Tue, Feb 11 2:00 PM – 4:00 PM Using Gephi for Network Analysis and Visualization

 Wed, Feb 12 1:00 PM – 3:00 PM Introduction to ArcGIS

 Tue, Feb 18 2:00 PM – 3:30 PM Introduction to Tableau Public 8

 Tue, Feb 25 1:00 PM – 3:00 PM ArcGIS Online

 Thu, Feb 27 1:00 PM – 3:00 PM Historical GIS

 Mon, Mar 3 2:00 PM – 3:30 PM  Designing Academic Figures and Posters

 Tue, Mar 4 1:00 PM – 3:00 PM  Useful R Packages: Extensions for Data Analysis, Management, and Visualization