Category Archives: Data Visualization

Data and GIS Back to School – Fall 2012

Visualize your data, analyze your results, map your statistics, and find the data you need!  Come visit us in Perkins 226 (second floor Perkins) for a consultation or contact us online (email: askdata@duke.edu or twitter: duke_data OR duke_vis).  We look forward to working with you on your next data driven project.

New Data Lab Opens- August 2012

http://library.duke.edu/data/about/lab.html

With 12 workstations with dual 24″ monitors and 16 gigs of memory, the new Data and GIS lab is ready to take on the most challenging statistical, mapping, and visualization research projects.  The new lab also features a flatbed scanner for projects moving from print to digital data.  Lab hours are the same hours as Perkins Library (almost 24/7).

Visualize This!  New Data Visualization Program

Perkins Library is proud to introduce Angela Zoss our new Data Visualization Coordinator. Schedule a consultation, attend a workshop, or learn more about research in Data Visualization at Viz Forum this fall.

New workshops for Fall 2012

http://library.duke.edu/data/news/index.html

Learn about data management planning. Apply text mining strategies to understand your documents.  Visualize your data with Tableau Public, or map your results using ArcGIS or Google Earth Pro.  A new series of workshops connects traditional statistical, geospatial, and visualization tools with web based options.  Register online for our courses or schedule a session for your course by emailing askdata@duke.edu

Bloomberg Professional News and Financial Data

http://blogs.library.duke.edu/data/2011/08/29/bloomberg-has-arrived/

If you missed last fall’s Bloomberg service – Duke Libraries in pleased to announce the installation of three Bloomberg financial terminals in the Data and GIS Lab in 226 Perkins.  The terminals provide the latest news and financial data and include an application that makes it easy to export data to Excel.  Access is restricted to all current Duke affiliates.  Training on Bloomberg is currently being planned for the last week of September.  Please email askdata@duke.edu to reserve a space at the training session.

Get help with Data Management Planning

http://library.duke.edu/data/guides/data-management/index.html

Data and GIS has launched a new guide that provides guidance for researchers looking for advice on data management plans now required by several granting agencies.  The guide provides examples of sample plans, key concepts involved in writing a plan, and contact information for groups on campus providing data management advice.  In addition, we offer individual consultations with researchers on data management planning.

New Collections for Fall 2012

http://library.duke.edu/data/collections/new.html

Contact Us! – askdata@duke.edu

 

Online Mapping Tools – GeoCommons

Visualizing spatial data can be challenging.  Specialized software tools like ArcGIS produce excellent results, but often seem complex for relatively simple tasks. Several online tools have emerged recently that provide relatively easy alternatives for the display of spatial data.  In this post, we examine GeoCommons, a web based tool for presenting spatial data in detail.  (Go to this guide to see a comparison chart of packages and features, and see this Duke University Libguide for a more detailed review of GeoCommons.)

 

GeoCommons (geocommons.com)

GeoCommons is an online mapping application that easily imports a variety of data formats, including geospatial data, and quickly produces sharable maps.  In contrast to other mapping tools, GeoCommons contains several categorization algorithms, such as quantile classification and classification based on the standard deviation of the sample that assist with the construction of informative maps.  CSV files and ArcGIS shapefiles are two of the most widely used file formats compatible with GeoCommons.

GeoCommons is very easy to use and contains some of the display features contained in high-end GIS suites.  Creation of new variables tied to geographies can be tricky, so it’s advised to either upload data  and map in final form or to first identify the layer to which you will upload and join a complete data set.

 

Geocoding

Figure 1

To begin geocoding, upload a file.  Gecommons has the capability to recognize spatially encoded data.  Some formats may require user assistance.

If you’ve uploaded data that contains latitude and longitude coordinates, choose this option.  In my case, I had county FIPS codes that uniquely identified each county.  Selecting US Boundaries to the left, then USA Counties, I was able to successfully preview how well my FIPS codes matched the layer (Figures 1 and 2).  A variety of other boundary types are available.  The key is to have in your data a unique identifier that identifies each record in the same manner as an available geocoding layer.

Figure 2

Review the geocoding results and select Continue to proceed.

 

Mapping

Geocommons offers some nice built in features that assist with categorizing measures.  The application will produce summary statistics for numeric fields (Figure 3), which gives you a quick picture of your sample and can assist with how to categorize the data.  Click the “Make a Map” button to proceed to the interactive interface.

Figure 3

Also note the filter tab, which allows you to screen out groups of cases.  For example, I may request a minimum number of farms to screen out urban counties.

Figure 4 shows a standard choropleth map portraying median number of acres per farm by county for North Carolina in 2007.  In this example, I have classified counties into five groups using standard deviations to group counties.

 

Sharing

Figure 4

GeoCommons contains a wide variety of ways to share data (accessed through the About section).  Posting to Twitter, Facebook, and an array of other social media sites is possible with a few short clicks.  You can directly email a link to the map along with a short personal message right out of the application.For those who wish to post to a web page, GeoCommons provides two ways to insert a map, through a <div> tag and through an iframe.  All code is generated for copy and paste into your page.

To access a version of this map, simply follow this link.

Finally, GeoCommons will produce a PNG image and a KML document for download.  The image export feature appears to be relatively new and does take trial-and-error to align correctly.  In addition, it does not appear to include any base layers or legends in the output, only the data layer.

 

Other Notes

When using standard deviation and maximum breaks methods for grouping observations, double check the category definitions by changing the number of categories and the resulting changes to the definitions for the new groups.  This will help to confirm whether data are grouped appropriately and exactly what the definitions for each category are.

New Data Visualization Services at Data and GIS

Data visualization has a long history, but disciplines employ visualization for different purposes and with varying levels of complexity.  Visualizations can be compelling or confusing, engaging or enraging.  For researchers without prior experience with visualization, the cost of incorporating new techniques into an existing research program may be daunting.

A stacked area graph.

The Data & GIS Services Department of Perkins Library can help with data visualization at various scales and in any discipline.  Angela Zoss, Duke’s new full-time Data Visualization Coordinator, has arrived and is available for consultation.  Her role will be to provide visualization support for the Duke University community and to help centralize visualization resources and infrastructure.

A U.S. map with a data overlay of circular icons.

In addition to the existing mapping services and visualization workshops that have been offered for some time, this fall will bring new visualization workshops, instructional material, and web resources to assist with various components of the research process (e.g., data processing and analysis, software selection, post production).  Look for information not only on producing visualizations but also on opportunities for showcasing visualizations and research across campus. Our new visualization twitter feed (@duke_vis) will also be used to circulate tutorials, example visualizations, and other news and events related to visualization.

A network visualization.

There is no better time to start exploring what visualization can offer!  Stop by Perkins during our walk-in hours or send an email to askdata@duke.edu for consultation, or get in touch with Angela directly to learn more about the new visualization services.

Online Mapping Tools – Tableau Public 7

Visualizing spatial data can be challenging to learn. Specialized software tools like ArcGIS produce excellent results, but often seem complex for relatively simple tasks. Several online tools have emerged recently and provide relatively easy alternatives for the display of spatial data. In this ongoing series of alternatives, we review Tableau Public 7 in detail.  Go to this guide to see a comparison chart of packages and features, and see this Duke University Libguide for a more detailed review of Tableau 6.1.

 

Tableau Public (link)

Figure 1

Tableau Public is a free software application that allows you to easily map data and share maps through email or web pages by embeddable script. To use Tableau, you must download and install a free desktop application. Tableau Public also requires a free registration to share visualizations created in the software.

Tableau is designed to look and feel like a standard spreadsheet application. Geographic mapping is accomplished by dragging your coordinate fields and dropping them into the columns and rows fields (see Figure 1). In Tableau 7, you may also select “Filled Map” under the Marks panel, and select a geographic identifier for the “Level of Detail” field (see Figure 2). Once done, add the variable to color by to the color field. In these examples, more intense colors indicate larger median farm size, measured in acres.

Figure 2

 

Geocoding

Tableau generates new fields that hold coordinate data as it imports and geocodes your data. If you wish to create filled maps (states, counties, etc.) in Tableau 7, you must additionally have geographic identifiers that are unique for each case. In Figure 2, the initial map only contained 50 polygons, as 50 North Carolina counties were uniquely named within the United States.

Had I also included a state field, unique identification would have been automatic, but Tableau allowed me to define the state for each case, and lucky for me, I only had North Carolina data.

The geocoding options are extensive. The following list is not exhaustive: area codes, FIPS codes, county/state/country names, ZIP codes, and ISO country codes. Of course, any coordinate data will work for point data.

 

Sharing

Sharing on a web page is accomplished through embeddable Javascript. Sadly, I was unable to get Tableau to work within WordPress, but you may see a live version of this map by following this link.

 

Other Notes

Tableau is very easy to use, provided your data is reasonably clean. With geographic data, be certain to either have something that uniquely identifies each entity or have latitudes and longitudes. It is preferable to err on the side of including more identification fields rather than less (i.e. including state names in addition to counties).

Also be aware that Tableau is not backward-compatible. For example, the workbook used in this example was initially created in Tableau 6.1, modified in Tableau 7, but failed to open once I moved back to Tableau 6.1. However, irrespective of version, you will be able to see any visualizations produced in any version.

 

What’s new in ArcGIS 10?

Basemaps

Would you like to add aerial photography or a topographic map underneath map layers for visual appeal or context? With ArcGIS 10, you can add a basemap to your map project.

A basemap is a link to an online imagery data source. You must be connected to the Internet in order to see a basemap.

Basemaps contain imagery at different levels of detail. When zooming in or out, new imagery will replace old imagery, which provides an approprate level of detail at any zoom level and improves performance by limiting the amount of information to be downloaded and displayed.

 

Export Map Packages

Sharing maps and shapefiles with others can be a pain when a map is composed of many shapefiles and layers.  A map package bundles all shapefiles, layers, and map documents into a single file that can be opened by others with ArcGIS 10.

 

Background Processing

In ArcGIS 10, ArcToolbox tools default to background processing.  This allows you to continue to work while the tool processes your data.

To disable background processing, navigate to the “Geoprocessing Options…” choice under the Geoprocessing Menu Bar, and uncheck the “Enable” box.

 

Search Toolbox Feature

Got a tool you want to use but can’t remember what toolbox its in?  With the Search feature, you can easily locate what you need. Your search term can be the tool name or a close approximation of what you wish to do.

 

Easy to Use Time Data

Time series data became easier to use with ArcGIS 10. Version 10 recognizes time series data with the addition of a single time field.

For example, suppose you have annual precipitation for US cities.  Your data will contain an ID field, a point field, a time field containing the year, and a field containing the precipitation amount.

For more information, see this blog post.

 

How Do I Label Individual Items?

Have you ever wanted to label individudal items on a map, and avoid the cluttered appearance of labels for all features, such as that shown to the right?

ArcGIS 10 hides the tool that you use to label individual items, but it’s easy to get back.

  1. Turn on the “Labeling” toolbar under the Customize Menu Bar.
  2. At the top right corner of the toolbar, click the arrow pointed downward and click “Customize…”
  3. Select the “Commands” tab and select the “Label” category (left panel).
  4. In the right panel, drag the “Label” tool and drop it into any toolbar that you wish.

Time Series Visualizations in ArcGIS – An Introduction

Introduction

ArcGIS 10 makes it easy to manage and visualize time-series data to identify trends and create compelling visualizations.  Creating a visualization of time-series data requires only a few additional steps beyond those needed to produce any map.

Step 1: Data Formatting

Time-series data contains records, each of which is specific to both an individual and to a single point in time.  The following example uses employment data for the textile industry in North Carolina from 2000 through 2009.

In this example, “fips” corresponds to each county’s unique FIPS code, “industry” corresponds to the textile industry’s unique NAICS code representation, “t” denotes the year.  Establishments, employment, and annual pay, our data items, are stored in the fields “est”, “emp”, and “pay_ann”.  All missing values were coded ‘-1’.

Tip: Make sure each record has a value.  Records without values will not be drawn in ArcGIS.

Tip: Do not name the time field “year,” as it is a reserved name in ArcGIS.

We suggest based on experience that the storage of data in a Microsoft Access database provides the greatest degree of reliability.

Step 2: Add Data to Map in ArcGIS

Once the data is formatted, join the data to a geographic layer.  For help in finding a geographic layer, please consult the Perkins Data and GIS Services Department.

Tip: When joining layers, it is good practice to Verify the join selection before approving.  The program will inform you of any errors.

Step 3: Enabling Time

Once the data are joined to a layer, enter the layer properties by right-clicking the layer name in the Table of Contents pane.

Navigate to the Time tab and check the box.  ArcGIS will want to know which field contains time information, as well as the format.  If the join was successful, you will see the fields that represent the data joined to the geographic layer.  In this example, the time field is labeled “t”.

You must also specify the date/time format.  Available time formats are listed to the right.

Finally, you will have to enable time on the data table as well.  To do this, right-click the data table in the Table of Contents pane.  Follow the same steps as presented for the geographic layer.

Step 4: Enable Time Display

Now that ArcGIS understands the data structure, you may enable time visualization.  The “Tools” toolbar, which contains the most commonly used tools, contains the button highlighted below, “Open Time Slider Window”.  Select this button.

The time slider window (left) will appear.  The slider spans the time range of the data, identifies what point in this range is currently displayed on the map, and allows for access to a variety of playback and recording options.  To access these options, click the options button.

This button is the equivalent of “Play.”  It will display the data from the first time point to the last.

Buttons with both arrows and vertical lines are one-step increments.  This particular button moves forward one time increment, the other one moves back.

This button exports the display to video.  This is the final step.

Step 5: Configure Options and Visual Display

Before you export to video, you will want to configure the appearance of the map.  This example will focus on new options that come with time series data.

First, select “Options” in the Time Slider toolbar.  Under the “Time Display” tab, you can alter the format of the displayed date to conform to your data.  In this example, I selected 2011 (yyyy) because we are using annual data.

Second, under the “Playback” tab, you can specify a length of time for playback.  This example contains 10 years of data.  If I specify 5 seconds playback, each data year will be displayed for one-half second.  If I specify 10 second, each year will be visible for 1 second.

Third, I will display the year in order to make clear to the viewer the time point that is visible.  To do this, I will go to “Insert” “Dynamic Text” “Data Frame Time.”

Tip: Alternatively, you can insert the data frame time into the title or other display object by including the following in the text of the object: <dyn type=”dataFrame” name=”Layers” property=”time” emptyStr=”[off]”/>

After some trial and error, I successfully integrated the time currently visible into the title.  The image to the left shows its appearance.

Step 6: Export to Video

Once the appearance of the map is satisfactory, you can export the map to video or to sequential images.  Click the “Export to Video” button on the time slider window.

Tip: maximize the ArcGIS window, switch to Layout View, zoom the layout to 100%, and clear any toolbars that may obstruct the layout view to improve video appearance.

First, you will be asked for a file or folder location and the export format.  Videos are exported as AVI files, while sequential images are exported to a folder either as bitmaps or JPEGS.

Second, if you exported to video, you will be asked to select a codec, which essentially encodes and compresses the outputted video.  The codec selection depends on the individual machine, and some codecs work with ArcGIS better than others.

Finally, you may have to produce a video several times before it comes out as expected.  Be sure to watch for missing time points, as this frequently happens.  Fixing the video length to a specific play duration per time point (one-half second or one second) helps you watch for these missing time points.

The following example is a 5-second video that displays employment in the textiles industry in North Carolina from 2000 through 2009.  Note that declining employment is signified by colors that change from dark to light.

Where There’s Smoke …

A team of Duke undergraduates participating in the Global Health Capstone course was awarded the “Outstanding Capstone Research Project” for their examination of state and congressional district characteristics that might influence the outcome of legislative efforts to raise cigarette excise taxes in North Carolina, South Carolina, and Mississippi.  Sarah Chapin and Gregory Morrison used GIS mapping tools in the Library’s Data & GIS Services Department to illuminate the relationships between county demographics and state legislators’ votes for or against cigarette tax hikes. Brian Clement, Alexa Monroy, and Katherine Roemer were other members of the research group.  Congratulations!

Regional Focus
The recent cigarette excise tax increases Mississippi (2009), North Carolina (2009), and South Carolina (2010) served as case studies from which to draw components of successful strategies to develop a regional legislative toolkit for those wishing to increase cigarette excise taxes in the Southeast.  In all of these states, the tax increase was controversial. The Southeast in general is tax averse, which presents a systemic challenge to those who advocate raising taxes on cigarettes.

Senate Votes & Poverty by CountyThe researchers examined state characteristics which might influence the outcome of efforts to raise excise taxes, such as coalitions for and against proposed increases, the facts each side brought to bear and the nature of the discourse mobilized by different groups, the economic impact in each state of both smoking and the proposed excise taxes, and local political realities. The students restricted the area of interest to the Southeast because this region has a shared history and, consequently, similar challenges when it comes to race, poverty, and rural populations. They are also, broadly speaking, politically similar and have had a similar experience with both tobacco use and government regulation.

This multi-disciplinary analysis provides a reference point for state legislators or interest groups wishing to pass cigarette tax increases.  The deliverable provided a model of past voting trends, suggestions for framing political dimensions of the issue, and strategies to overcome opposition in state legislatures.

Comparing Legislative Districts and County Data
Senate Votes & Party AffiliationThe bulk of the research involved mapping the political landscape surrounding cigarette tax legislation.  In doing so, researchers looked at voting records, interest group politics, campaigns, and state ideology. Broadly, the research entailed charting the electoral geography by overlaying state house and senate districts with county-level data.  Districts were coded based on voting history, party affiliation, smoking rates, and constituent demographics.  State legislature websites were used to find representatives’ voting histories, allowing the researchers to match legislators by county when constructing a GIS dataset.  County party affiliations are available through the state board of elections.  Finally, county demographics came from the 2010 Census data.

Senate Votes & Percent Black by County

Overcoming Ideology
Besides using GIS mapping to illustrate these relationships, the researchers analyzed the involvement of major interest groups, specifically, lobbying expenditures and campaign contributions to map the involvement of both pro- and anti-tobacco interest groups.  Additionally, they examined the impact of state ideology on the framing of political dimensions, looking at editorials, opinion pieces, newspapers, and committee markups, as well as interviews (both previous interviews and ones they conducted) with state legislators and interest groups.  Overcoming state ideology, both political and social, is a major factor in passing cigarette excise tax legislation, especially in a region with such dominant tobacco influence.

Again, the purpose of the research is not merely to understand the political landscapes surrounding the passage of cigarette tax bills, but to apply these findings to the creation of a legislative toolbox for representatives or interests groups concerned with pushing similar legislation.

Swimming in a Sea of Data

This post comes from Erika Kociolek, a second year Master in Environmental Management student at the Nicholas School.  The Data and GIS staff want to congratulate Erika on successfully defending her project!

For about 4 months, I’ve been swimming in a proverbial sea of data related to hypoxia (low dissolved oxygen concentrations) and landings in the Gulf of Mexico brown shrimp fishery.  I’m a second year master of environmental management (MEM) student at the Nicholas School, focusing on Environmental Economics and Policy.  I’ve been working with my advisor, Dr. Lori Bennear, to complete my master’s project (MP), an analysis attempting to estimate the effect of hypoxia  on landings and other economic outcomes of interest.

To do this, we are using data from the Southeast Monitoring and Assessment Program (SEAMAP), NOAA/NMFS, and a database of laws and policies related to brown shrimp that I compiled in Fall 2010.  By running regressions that difference out all variation in catch except for that attributable to hypoxia, we can isolate its effect on economic outcomes of interest.  I’ve found that catch, revenue, catch per unit effort, and revenue per unit effort are all larger in the presence of summer hypoxia.  However, if we look at catch for different sizes of shrimp, we see that in the presence of summer hypoxia, catch of larger shrimp decreases and catch of smaller shrimp increases significantly.

Getting to the point of discussing results has required a bunch of data analysis, cleaning, management, and visualization.  I used R, STATA, ArcGIS, and have even used video editing software to make dynamic graphics representing my results that have improved my own understanding of the raw data.  As an example, the video below, showing the change in hypoxia over time (1997-2004), was created using ArcGIS 10.

http://youtu.be/2YfYBE_Fe7U

Note: The maps in the video above use data from the Southeast Monitoring and Assessment Program (SEAMAP).

Hypoxia is a dynamic and complex phenomenon, varying in severity, over time, and in space; hypoxia in Gulf waters is more severe and widespread in summer.  The model I’m using actually takes advantage of this variation to obtain an estimate of the effect of hypoxia on catch and other economic outcomes.  To show people the source of variation I’m exploiting, I created this video.  These maps are drawing on data of dissolved oxygen concentrations and displaying it spatially.

We have dissolved oxygen measurements for most of the Gulf in the summer (June) and fall (December).  Each subarea-depth zone (see related map) that changes from salmon shading (not hypoxic) to red (hypoxic), or vice-versa, is variation in hypoxia that the models I’m running use to get an estimate of the hypothesized effect.

Many thanks are due to my advisor, Dr. Bennear, as well as to the helpful folks at the Data/GIS lab, who have provided invaluable assistance with the data management and data visualization components of this project!

This research was funded by NOAA’s National Center for Coastal Ocean Science, Award #NA09NOS4780235.

Wrangle, Refine, and Represent

Data visualization and data management represented the core themes of the 2011 Computer Assisted Reporting (CAR) Conference that met in Raleigh from February 24-27.  Bringing together journalists, computer scientists, and faculty, the conference united a number of communities that share a common interest in gathering and representing empirical evidence online (and in print).

While the conference featured luminaries in data visualization (Amanda Cox, David Huynh , Michal Migurski, Martin Wattenberg) who gave sage advice on how to best represent data online, web based data visualization tools provided a central focus for the conference.

Notable tools that may be of interest to the Duke research (and teaching) community include:

DataWrangler – An interactive data cleaning tool much like Google Refine (see below)

Google Fusion Tables – “manage large collections of tabular data in the cloud” – Fusion tables provides convenient access to google’s data visualization and mapping services.  The service also allows groups to annotate data online.

Google Refine – Refine is primarily a data cleaning tool that simplifies the process of cleaning data for further processing or analysis.  While users of existing data management tools may not be convinced to leave their current data management tool, Refine provides a rich suite of tools that will likely attract many new converts.

Many Eyes – One of the premier online visualization tools hosted by IBM.  Visualizations range from pie charts to digital maps to text analysis.  Many Eye’s versatility is one of its key strengths.

Polymaps – Billed as a “javascript library for image- and vector-tiled maps” – Polymaps allows the creating of custom lightweight map services on the web.

SIMILE Project (Semantic Interoperability of Metadata and Information in unLike Environments) – The SIMILE Project is a collection of different research projects designed to “enhance inter-operability” among digital assets.  At the conference, the Exhibit Project received particular attention for its ability to produce data rich visualization with very little coding required.

Timeflow –  Presented by Sarah Cohen and designed by Martin Wattenberg- Timeflow provides a convenient application for visualizing temporal data.