Duke Libraries and SSRI welcome Mara Sedlins!

On behalf of Duke Libraries and the Social Science Research Institute, I am happy to welcome Mara Sedlins to Duke.  As the library and SSRI work to develop a rich set of data management, analysis, and archiving strategies for Duke researchers, Mara’s postdoctoral position provides a unique opportunity to work closely with researchers across campus to improve both training and workflows for data curation at Duke.  – Joel Herndon, Head of Data and Visualization Services, Duke Libraries  

2016-08-25 11.06.17 HDRI am excited to join the Data and Visualization Services team this fall as a postdoctoral fellow in data curation for the social sciences (sponsored by CLIR and funded by the Alfred P. Sloan Foundation). For the next two years, I will be working with Duke Libraries and the Social Science Research Institute to develop best practices for managing a variety of research data in the social sciences.

My research background is in social and personality psychology. I received my PhD at the University of Washington, where I worked to develop and validate a new measure of automatic social categorization – to what extent do people, automatically and without conscious awareness, sort faces into socially constructed categories like gender and race? The measure has been used in studies examining beliefs about human genetic variation and the racial labels people assign to multiracial celebrities like President Barack Obama.

While in Seattle, I was also involved in several projects at Microsoft Research assessing computer-supported cooperative work technologies, focusing on people’s preferences for different types of avatar representations, compared to video or audio-only conferencing. I also have experience working with data from a study of risk factors for intimate partner violence, managing a database of donors and volunteers for a historical archive, and organizing thousands of high-resolution images for a large-scale digital comic art restoration project.

I look forward to applying the insights gained from working on a diverse array of data-intensive projects to the problem of developing and promoting best practices for data management throughout the research lifecycle.  I am particularly interested in questions such as:

  • How can researchers write actionable data management plans that improve the quality of their research?
  • What strategies can be used to organize and document data files during a project so that it’s easy to find and understand them later?
  • What steps need to be taken so that data can be discovered and re-used effectively by other researchers?

These are just a few of the questions that are central to the rapidly evolving field of data curation for the sciences and beyond.


Fall 2016 DVS Workshop Series

Data and Visualization Services is happy to announce its Fall 2016 Workshop Series. Learn new ways of enhancing your research with a wide range of data driven research methods, data tools, and data sources.

Data Sources
Data Cleaning and Analysis
Data Analysis
Introduction to Stata (Two sessions: Sep 21, Oct 18)
Mapping and GIS
Introduction to ArcGIS (Two sessions: Sep 14, Oct 13)
ArcGIS Online (Oct 17)
Data Visualization

Visualizing Qualitative Data (Oct 19)
Visualizing Basic Survey Data in Tableau – Likert Scales (Nov 10)

Data Fest 2016 Workshop Series

Duke Libraries are happy to welcome the 2016 ASA DataFest to the Edge on April 1-3rd.  As part of  DataFest 2016, the Edge is hosting five DataFest related workshops designed to help teams and others interested in data driven research expand their skills.

DataFest Workshop Series

Data Analysis with Python
Tuesday, March 22
6:00-9:00 PM
This will be a hands-on class focused on performing data analysis with Python. We’ll help participants set-up their Jupyter Notebook development environment, cover the basic functions for reading and manipulating data, show examples of common statistical models and useful packages and show some of the python visualization tools.

Introduction to R
Wednesday March 23
6:00-8:00 PM
Introduction to R as a statistical programming language. This session will introduce the basics of R syntax, getting data into R, various data types and classes, etc. The session assumes no or little background in R.

Data Munging with R and dplyr
Monday, March 28
6:00-8:00 PM
This session will demonstrate tools for data manipulation and cleaning of data in R. Majority of the session will use the dplyr and tidyr packages. Some background in R is recommended. If you are not familiar with R, make sure to first attend the first R workshop in the series.

Data visualization with R, ggplot2, and shiny
Wednesday, March 30
6:00-8:00 PM
This session will demonstrate tools for static and interactive data visualization in R using ggplot2 and shiny packages. Some background in R is recommended. If you are not familiar with R, make sure to first attend the first R workshop in the series.

EDA and Interactive Predictive Modeling with JMP
Thursday, March 31
4:00-6:00 PM
JMP® Statistical Discovery Software is dynamic, visual and interactive desktop software for Windows and Mac. In this hands-on workshop we see tools for exploring, visualizing and preparing data in JMP. We’ll also learn how to fit a variety of predictive models, including multiple regression, logistic regression, classification and regression trees, and neural networks. A six month license of JMP will be provided.

Story Maps

Telling Stories with Maps

StoryMap Pic1“Story maps” are a popular method of telling place-themed stories and engaging with your audience over the web. Story maps are highly interactive, allowing users to follow along a path or time-line with links to content along the way. They’re also a great way to visualize current events and news topics in a way that brings perspective and context to important issues. As a student or researcher, you can use maps to tell a story about your research study area. In that sense, they can be a great tool for drawing attention to your work, and you could consider it another form of social media.

Creating a web map may seem like a challenge if you’ve never done it before, but there are several tools available online that can quickly and easily generate a story map. For this post, I’ll introduce you to two different types of story maps and suggest some free tools for creating your own.

Mapping Places or Events 

Story maps that cover a series of events are useful for contextualizing news events, giving an online tour, or linking to almost any kind of location specific information. Story maps of this style are fun to use because they typically provide both a map and multimedia content. The user accesses the information in an interactive format -which is a great way for your message to sink in!

For example, I created this story map that links historic building photos of the Construction of Duke University to their locations on a map.

View the full size map here.

Some applications for this type of story map are publishing information about research areas, adding new points of access for digital humanities, or documenting travel or a field expedition.

Thematic Maps

Another popular style of story map is one that presents a series of thematic maps. These types of maps often depict how changes have occurred over time in a place or perhaps the unfolding of a news event. Side-by-side comparison of maps can also be a visually interesting way to illustrate an important issue. An interesting comparison map might show US Census demographic data from different census years in a city to show how people have changed.

This map illustrates how manufacturing jobs have changed around Flint Michigan from 1990 to 2010.

Click here for full size.

There are also some really cool interactive features out there for this style of map like a tabbed viewer, a swipe or slide function between two different maps, and ESRI’s SpyGlass.

Create Your Own!

Some great tools are available to the Duke community and freely on the web that let you create these types of “story maps” with minimal training. Here are three tools you can use and what each does best…

StoryMap JS is a completely free and open access tool by the Knight Lab at Northwestern University. A Google account is necessary because StoryMap JS actually saves the maps you create in the recent folder of your Google Drive. StoryMap JS is incredibly easy to use, too. It has a very simple and intuitive interface that will let you start making your map in minutes. You can also use StoryMap JS for non-cartographic visual materials, and there is a cool off-shoot that allows your to instantly map 20 recent geo-tagged Instagram photos from any user account. Best Uses: Try using StoryMap JS when you’re telling a story that unfolds over a path or timeline. It’s also great for linking to media like photos or YouTube content.

Social Explorer You may have used Social Explorer before to gather US Census data, but you can also create thematic maps that you can share or embed in a website. With your Duke credentials, you have access to the Professional Edition. The data is pre-loaded, so you’re just a few clicks away from a beautifully shaded thematic map of US Census Data that you can share over the web. The map interface is user-friendly and has a “Change Layout” button at the bottom center that creates side-by-side and swipe comparison maps. You can also create an annotated presentation that let’s the user cycle through a series of maps. Here is a quick example of a map presentation I made in Social Explorer. Best Uses: Social Explorer’s best use is for mapping US Census data. The “Tell a Story” function allows you to join graphs and other media to your map and create interactive presentation slides.

ESRI ArcGIS Online For more advanced users, or just those looking for more customization options, ArcGIS Online offers an abundance of tools and templates for creating attractive and engaging map presentations. ArcGIS Online Story Maps require an account with ESRI. You can sign up for a free public account, or, for more advanced features, you can request a free organizational account that is available to the Duke community. To take advantage of all ArcGIS Online has to offer, you will need to familiarize yourself with the how to use it. Once you’ve made a few maps, you can load maps and multimedia content into any of ESRI’s Story Map Apps. Take a look at this gallery of to view some examples of what you can do with Story Maps in ArcGIS Online. Even though there is a bit of a learning curve to ArcGIS Online, the pay off is huge.

Here is a customized slider map I made using the Story Map Swipe App that shows changes in North Carolina’s Congressional District Boundaries following the 2012 redistricting. Use the slider to swipe between views.

View the full size map here.

Best Uses: Fully customized story maps of any type. Great for telling place-based stories and presenting a series of thematic maps complete with multimedia content.

I hope you enjoyed viewing some of these story maps! I’m sure you can see that there are many different uses for this type of media. If you’ve made a cool story map, feel free to share it with us in the comments!

Data and GIS Fall 2013 Newsletter

Analyze, discover, manage, map, and visualize your data with Duke Libraries Data and GIS Services.  Our team of five consultants provides a broad range of support in areas ranging from data analysis, data visualization, geographic information systems, financial data, statistical software and data storage and management.

Data Management

Data Management Planning – DMPTool – Get 24/7 online help for your next data management plan, including information about Duke resources available for your data work.

Statistical Software Updates

Job Opportunities in Data and GIS Services

Data & GIS Services is hiring!  We have two open positions for student web programmers interested in working on data visualization projects.  See the Library Student employment page ( for more information on how to apply.  (The job can be found by searching for requisition number “DUL14-AMZ02”.)

New Data and Map Collections

CPS on Web (CPS Utilities Online)
CPS on Web is a set of utilities enabling you to access CPS data and documentation from this website.   You may make tables and graphs from the CPS data, download data extractions, make estimations, get summaries and statistical measures, search the documentation, and make your own variables as functions of the existing ones.

Global Financial Data
Global Financial Data is a collection of financial and economic data provided in ASCII or Excel format. Data includes: long-term historical indices on stock markets; Total Return data on stocks, bonds, and bills; interest rates; exchange rates; inflation rates; bond indices; commodity indices and prices; consumer price indices; gross domestic product; individual stocks; sector indices; treasury bill yields; wholesale price indices; and unemployment rates covering over 200 countries.

LandScan Global
The LandScan Global Population Database provides global population distribution in a gridded GIS format at 30 arc-second resolution (approximately 1×1 km cells). Oak Ridge National Laboratory developed modeling techniques to disaggregate and interpolate census data within administrative boundaries to create a GIS layer showing population distribution as accurately and as timely as possible. EastView provides this data to use in GIS software as a WMS (Web Mapping Service) or as a WCS (Web Coverage Service) to allow a user to incorporate population distribution into GIS mapping and analysis.

Data and GIS Spring Semester News

Clean your data with Google Refine.  Use digital maps to explore the present and past.  Analyze data with R or Stata. Visualize your research with one of our data visualization courses.  The Data and GIS Workshops offer a range of research strategies for data based questions.

Visualize This (and win a $500 technology prize)!

Are you a current Duke University undergraduate or graduate student? Have you used data visualization in a past or current research project to help solve a problem, tell a story, or highlight an interesting trend? Write up a short description and you’ll have a submission for the contest and a chance to win a $500 technology prize.

New Data Lab

As mentioned in the fall – with 12 workstations with dual 24″ monitors and 16 gigs of memory, the new Data and GIS lab is ready to take on the most challenging statistical, mapping, and visualization research projects. The new lab also features a flatbed scanner for projects moving from print to digital data. Lab hours are the same hours as Perkins Library (almost 24/7).

Get help with Data Management Planning

Puzzled by data management planning?  Not sure what to include in your grants data management plan?  Data and GIS has launched a guide that supports researchers looking for advice on data management plans now required by several granting agencies.  The guide provides examples of sample plans, key concepts involved in writing a plan, and contact information for groups on campus providing data management advice.

Online Mapping Tools – GeoCommons

Visualizing spatial data can be challenging.  Specialized software tools like ArcGIS produce excellent results, but often seem complex for relatively simple tasks. Several online tools have emerged recently that provide relatively easy alternatives for the display of spatial data.  In this post, we examine GeoCommons, a web based tool for presenting spatial data in detail.  (Go to this guide to see a comparison chart of packages and features, and see this Duke University Libguide for a more detailed review of GeoCommons.)


GeoCommons (

GeoCommons is an online mapping application that easily imports a variety of data formats, including geospatial data, and quickly produces sharable maps.  In contrast to other mapping tools, GeoCommons contains several categorization algorithms, such as quantile classification and classification based on the standard deviation of the sample that assist with the construction of informative maps.  CSV files and ArcGIS shapefiles are two of the most widely used file formats compatible with GeoCommons.

GeoCommons is very easy to use and contains some of the display features contained in high-end GIS suites.  Creation of new variables tied to geographies can be tricky, so it’s advised to either upload data  and map in final form or to first identify the layer to which you will upload and join a complete data set.



Figure 1

To begin geocoding, upload a file.  Gecommons has the capability to recognize spatially encoded data.  Some formats may require user assistance.

If you’ve uploaded data that contains latitude and longitude coordinates, choose this option.  In my case, I had county FIPS codes that uniquely identified each county.  Selecting US Boundaries to the left, then USA Counties, I was able to successfully preview how well my FIPS codes matched the layer (Figures 1 and 2).  A variety of other boundary types are available.  The key is to have in your data a unique identifier that identifies each record in the same manner as an available geocoding layer.

Figure 2

Review the geocoding results and select Continue to proceed.



Geocommons offers some nice built in features that assist with categorizing measures.  The application will produce summary statistics for numeric fields (Figure 3), which gives you a quick picture of your sample and can assist with how to categorize the data.  Click the “Make a Map” button to proceed to the interactive interface.

Figure 3

Also note the filter tab, which allows you to screen out groups of cases.  For example, I may request a minimum number of farms to screen out urban counties.

Figure 4 shows a standard choropleth map portraying median number of acres per farm by county for North Carolina in 2007.  In this example, I have classified counties into five groups using standard deviations to group counties.



Figure 4

GeoCommons contains a wide variety of ways to share data (accessed through the About section).  Posting to Twitter, Facebook, and an array of other social media sites is possible with a few short clicks.  You can directly email a link to the map along with a short personal message right out of the application.For those who wish to post to a web page, GeoCommons provides two ways to insert a map, through a <div> tag and through an iframe.  All code is generated for copy and paste into your page.

To access a version of this map, simply follow this link.

Finally, GeoCommons will produce a PNG image and a KML document for download.  The image export feature appears to be relatively new and does take trial-and-error to align correctly.  In addition, it does not appear to include any base layers or legends in the output, only the data layer.


Other Notes

When using standard deviation and maximum breaks methods for grouping observations, double check the category definitions by changing the number of categories and the resulting changes to the definitions for the new groups.  This will help to confirm whether data are grouped appropriately and exactly what the definitions for each category are.

Data and GIS Winter Newsletter 2012

Data driven teaching and research at Duke keeps growing and Perkins Data and GIS continues to increase support for researchers and classes employing data, GIS, and data visualization tools.  Whether your discipline is in the Humanities, Sciences, or Social Sciences, Perkins Data and GIS seeks to support researchers and students using numeric and geospatial data across the disciplines.

New Website for 2012

  • Online data or digital maps that you need for your project
  • A workshop on the latest software packages and digital tools

New workshops for 2012
Clean your data with Google Refine. Learn about data management planning. Visualize your data with Tableau Public, or map your results using ArcGIS or Google Earth Pro.  A new series of workshops connects traditional statistical, geospatial, and visualization tools with web based options.

  • StataReview                               (Statistics/Data Management)
  • Introduction to ArcGIS           (Geographic Information Systems / Data Visualization)
  • Data Management Planning  (Data Management/Grants)
  • Geocommons                            (Geographic Information Systems / Data Visualization)
  • Google Earth (Pro)                   (Geographic Information Systems / Data Visualization)
  • Google Refine                           (Data Management/Descriptive Statistics)
  • Tableau Public                          (Data Visualization)

Bloomberg (terminals) have arrived

Duke Libraries in pleased to announce the installation of three Bloomberg financial terminals in the Data and GIS Lab in 226 Perkins.  The terminals provide the latest news and financial data and include an application that makes it easy to export data to Excel.  Access is restricted to all current Duke affiliates.

Get help with Data Management Planning

Data and GIS has launched a new guide that provides guidance for researchers looking for advice on data management plans now required by several granting agencies.  The guide provides examples of sample plans, key concepts involved in writing a plan, and contact information for groups on campus providing data management advice.

New Collections
Explore the Indonesian Village Potential Statistics (PODES), look at household economic behavior in the Indian National Sample Survey, or explore historical digital maps of Europe- the Data and GIS collection collects research data sets and maps of interest to the Duke community covering a wide range of topics.

Support for Restricted Data Contracts and Restricted Data Licensing
Perkins Library has partnered with the Social Science Research Institute (SSRI) to support restricted data licensing with Paul Pooley as a restricted data specialist.  Paul is available  to work with researchers licensing restricted data and negotiating restricted data management plans.  Please contact Paul or for more details.

Converting ArcGIS Layers to Google Earth (KML)

Converting ArcGIS layers to Google Earth allows others to easily see layers without specialized software.  Both ArcGIS and Google Earth Pro contain tools that allow conversion to and saving in KML format.
Note: Be certain you are allowed to share layers if they were not created by you.

Conversion using ArcGIS

  • First, open the layer that you wish to covert.
  • In the ArcToolbox window, expand “Conversion Tools,” then “To KML,” and select “Layer to KML.”
  • When the “Layer to KML” window appears, first select the shapefile or layer for the “Layer” box.
  • Next select a directory for the file to be created and provide a name for the file.
  • Finally, you must enter a number for the “Layer Output Scale.”  If your layer has a scale-dependent renderer, this setting allows you to export the KML at a specific level of resolution.  Otherwise, it has no effect, whatever the number.

For layers with many features, ArcGIS may produce a KML file that does not open in Google Earth due to errors.  There are two ways to solve this problem.

  • First, you can split your shapefile into several smaller shaepfiles.
  • Second, you can (usually) convert the shapefile to KML with Google Earth Pro.

Conversion using Google Earth Pro

  • First, open the shapefile with the Open command.  Be certain to change the file type to “ESRI Shapefile”.
  • When opened, you will receive a warning if your shapefile contains more than 2,500 items.  You will still possess the ability to import the entire file, but it may take some time.
  • You will be asked whether you wish to apply a style template to the document.  If you do so, you will be able to choose the attribute that contains the item name (for example, the address field or the street name field).
    Note: you don’t have to save the style template to select the name field.
  • Finally, right-click the layer added to the Temporary Places folder, and click “Save Place As.”  Provide a location and file name for the file to be created.

What’s hot in molecular biology databases

The journal Nucleic Acids Research has just published its 18th annual database issue. The current issue summarizes 96 new and 83 previously reviewed molecular biology databases, including GenBank, ENA, DDBJ, and GEO. Also included in the issue is an editorial advocating the creation of a “community-defined, uniform, generic description of the core attributes of biological databases,” which would be known as the BioDBCore checklist. Such a checklist would benefit both database users and provides: users would have a much easier time finding the appropriate resource and providers would be able to highlight specialized resources and the lesser known functionality of established databases.

Besides the databases reviewed in the current issue, Nucleic Acids Research maintains a select list of 1330 molecular biology databases that have been profiled in various database issues over the past 18 years.