Category Archives: tutorial

Fall Data and Visualization Workshops

2017 Data and Visualization Workshops

Visualize, manage, and map your data in our Fall 2017 Workshop Series.  Our workshops are designed for researchers who are new to data driven research as well as those looking to expand skills with new methods and tools. With workshops exploring data visualization, digital mapping, data management, R, and Stata, the series offers a wide range of different data tools and techniques. This fall, we are extending our partnership with the Graduate School and offering several workshops in our data management series for RCR credit (please see course descriptions for further details).

Everyone is welcome at Duke Libraries workshops.  We hope to see you this fall!

Workshop Series by Theme

Data Management

09-13-2017 – Data Management Fundamentals
09-18-2017 – Reproducibility: Data Management, Git, & RStudio 
09-26-2017 – Writing a Data Management Plan
10-03-2017 – Increasing Openness and Reproducibility in Quantitative Research
10-18-2017 – Finding a Home for Your Data: An Introduction to Archives & Repositories
10-24-2017 – Consent, Data Sharing, and Data Reuse 
11-07-2017 – Research Collaboration Strategies & Tools 
11-09-2017 – Tidy Data Visualization with Python

Data Visualization

09-12-2017 – Introduction to Effective Data Visualization 
09-14-2017 – Easy Interactive Charts and Maps with Tableau 
09-20-2017 – Data Visualization with Excel
09-25-2017 – Visualization in R using ggplot2 
09-29-2017 – Adobe Illustrator to Enhance Charts and Graphs
10-13-2017 – Visualizing Qualitative Data
10-17-2017 – Designing Infographics in PowerPoint
11-09-2017 – Tidy Data Visualization with Python

Digital Mapping

09-12-2017 – Intro to ArcGIS Desktop
09-27-2017 – Intro to QGIS 
10-02-2017 – Mapping with R 
10-16-2017 – Cloud Mapping Applications 
10-24-2017 – Intro to ArcGIS Pro


11-09-2017 – Tidy Data Visualization with Python

R Workshops

09-11-2017 – Intro to R: Data Transformations, Analysis, and Data Structures  
09-18-2017 – Reproducibility: Data Management, Git, & RStudio 
09-25-2017 – Visualization in R using ggplot2 
10-02-2017 – Mapping with R 
10-17-2017 – Intro to R: Data Transformations, Analysis, and Data Structures
10-19-2017 – Developing Interactive Websites with R and Shiny 


09-20-2017 – Introduction to Stata
10-19-2017 – Introduction to Stata 













Mapping in Google Spreadsheets

Screen Shot 2014-06-04 at 4.33.57 PMHere at Data & GIS Services, we love finding new ways to map things.  Earlier this semester I was researching how the Sheets tool in Google Drive could be used as a quick and easy visualization tool when I re-discovered its simple map functionality.  While there are plenty of more powerful mapping tools if you want to have a lot of features (e.g., ArcGIS, QGIS, Google Fusion Tables, Google Earth, GeoCommons, Tableau, CartoDB), you might consider just sticking with a spreadsheet for some of your simpler projects.

I’ve created a few examples in a public Google Sheet, so you can see what the data and final maps look like.  If you’d like to try creating these maps yourself, you can use this template (you’ll have to log into your Google account first, and then click on the “Use this template” button to get your own copy of the spreadsheet).

Organizing Your Data

The main thing to remember when trying to create any map or chart in a Google sheet is that the tool is very particular about the order of columns.  For any map, you will need (exactly) two columns.  According to the error message that pops up if your columns are problematic: “The first column should contain location names or addresses. The second column should contain numeric values.”

Of course, I was curious about what counts as “location names” and wanted to test the limits of this GeoMap chart.  If you have any experience with the Google Charts API, you might expect the Google Sheet GeoMap chart to work like the Geo Chart offered there.  In the spreadsheet, however, you have only a small set of options compared to the charts API.  You do have two map options — a “region” (or choropleth) map and a “marker” (or proportional symbol) map — but the choices for color shading and bubble size are built-in or limited.

Screen Shot 2014-06-04 at 4.36.54 PMRegion maps (Choropleths)

Region maps are fairly restrictive, because Google needs to know the exact boundary of the country or state that you’re interested in.  In a nutshell, a region map can either use country names (or abbreviations) or state names (or abbreviations).  The ISO 3166-1 alpha-2 codes seem to work exceptionally well for countries (blazing fast speeds!), but the full country name works well, too.  For US states, I also recommend the two letter state abbreviation instead of the full state name. If you ever want to switch the map from “region” to “marker”, the abbreviations are much more specific than the name of the state.  (For example, when I switch my “2008 US pres election” map to marker, Washington state turns into a bubble over Washington DC.)

Screen Shot 2014-06-04 at 4.37.57 PMMarker maps (Proportional symbol maps)

Marker maps, on the other hand, allow for much more flexibility.  In fact, the marker map in Google Sheets will actually geocode street addresses for you.  In general, the marker map will work best if the first column (the location column) includes information that is as specific as possible.  As I mentioned before, the word “Washington” will go through a search engine and will get matched to Washington DC before Washington state.  Same with New York.  But the marker map will basically do the search on any text, so the spreadsheet cell can say “NY”, or “100 State Street, Ithaca, NY”, or even the specific latitude and longitude of a place. (See the “World Capitals with lat/lon” sheet; I just put latitude and longitude in a single column, separated with a comma.)  As long as the location information is in a single column, it should work, but the more specific the information is, the better.


Screen Shot 2014-06-04 at 4.31.56 PMWhen you have your data ready and want to create a map, just select the correct two columns in your spreadsheet, making sure that the first one has appropriate location information and the second one has some kind of numerical data.  Then click on the “Insert” menu and go down to “Chart…”  You’ll get the chart editor.  The first screen will be the “Start” tab, and Google will try to guess what chart you’re trying to use.  It probably won’t guess a map on the first try, so just click on the “Charts” tab at the top to manually select a map.  Map is one of the lower options on the left hand side, and then you’ll be given a choice between the regions and markers maps.  After you select the map, you can either stick with the defaults or go straight to the final tab, “Customize,” to change the colors or to zoom your map into a different region.  (NB: As far as I can tell, the only regions that actually work are “World,” “United States,” “Europe,” and “Asia”.)

Screen Shot 2014-06-04 at 4.33.35 PMThe default color scale goes from red to white to green.  You’ll notice that the maps automatically have a “mid” value for the color.  If you’d rather go straight from white to a dark color, just choose something in the middle for the “mid” color.

And there you have it!  You can’t change anything beyond the region and the colors, so once you’ve customized those you can click “Update” and check out your map.  Don’t like something?  Click on the map and a little arrow will appear in the upper right corner.  Click there to open the menu, then click on “Advanced edit…” to get back to the chart editor.  If you want a bigger version of the map, you can select “Move to own sheet…” from that same menu.

Pros and Cons

So, what are these maps good for?  Well, firstly, they’re great if you have state or country data and you want a really quick view of the trends or errors in the data.  Maybe you have a country missing and you didn’t even realize it.  Maybe one of the values has an extra zero at the end and is much larger than expected.  This kind of quick and dirty map might be exactly what you need to do some initial exploration of your data, all while staying in a spreadsheet program.

Another good use of this tool is to make a map where you need to geocode addresses but also have proportional symbols.  Google Fusion Tables will geocode addresses for you, but it is best for point maps where all the points are the same size or for density maps that calculate how tightly clusters those points are.  If you want the points to be sized (and colored) according to a data variable, this is possibly the easiest geocoder I’ve found.  It’ll take a while to search for all of the locations, though, and there is probably an upper limit of a couple of hundred rows.

If this isn’t the tool for you, don’t despair!  Make an appointment through email ( or stop in to see us (walk-in schedule) to learn about other mapping tools, or you can even check out these 7 Ways to Make a Google Map Using Spreadsheet Data.

Online Data Mapping Tools – Google Fusion Tables

Visualizing spatial data can be challenging. Specialized software tools like ArcGIS produce excellent results, but often seem complex for relatively simple tasks. Several online tools have emerged recently that provide relatively easy alternatives for the display of spatial data. In this post, we examine Google Fusion Tables, which combines visualizations, including spatial visualizations, with a database back end. The key advantages to Fusion Tables are easy display of latitude/longitude data or data that is included with address information. In addition, Fusion Tables provides a one-stop location for producing visualizations other than maps, such as line charts or tables.

Uploading Data

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8

Uploaded to Fusion Tables is easy through Google Docs.  Simply log in if you have an account, create a new Table, and on the next screen, point to the file you wish to upload.  Excel and CSV files are the two most commonly used, and KML files allow for upload of maps that contain spatial information, such as locations or polygon definitions.

One thing to note about Google products is that they are often in a state of flux.  Limits and restrictions noted below may change in the future.  For further information regarding Google Fusion Tables, please consult this Libguide authored by Mark Thomas.

Geocoding Addresses

A complete list of geographic data types can be found at the Google support site.  In this post, two of the more common geocoding types will be addressed, address data and data that applies to states, counties, and similar objects.

Address data is pretty easy to work with.  Addresses should contain as much information as possible with items separated by spaces only, no commas.  For example, 134 Chapel Drive Durham NC 27708 should produce a pretty good geocoding result.  In the following example, Durham gun crimes for 2011 were downloaded from the Durham Police Department.  The data only came with address information, so city and state data were subsequently added and combined in Excel (location field).  In Figure 1, highlighted fields indicate spatial information.

To geocode the addresses, select “Map” under the “Visualize” tab.  The program will automatically begin geocoding based on the left-most field containing spatial information, which is city in this case.  Changing the field to “location,” which contains the full address information, will correctly geocode these addresses (Figure 2).

Once complete, the geocoded points are plotted on a map (Figure 3).  As with other mapping applications, you may apply a symbology to the points in order to visualize your data.

This particular dataset contains a numeric field that identifies 5 general types of crime (crime_cat_num).  Under the “Configure styles” link at the top, navigate to “Buckets” and divided the data into five buckets (Figure 4).

Once saved, each color will represent a different type of crime and shown in Figure 5 (red indicates robbery, yellow, assault and so on).

Note that only numeric fields can be used to categorize data, so you may wish to create these fields prior to upload.

Geocoding Polygons

Working with polygon data can be a bit trickier because the polygons must be spatially defined.  Fusion Tables does this by using kml, which is basically a large piece of text containing all of the coordinates, in order, that define a boundary.  For example, in this table, each boundary is defined in the geometry field. Google provides a variety of boundary types, which are available here.  If your data match one of these existing boundary types, you may upload data and merge it with the correct table, which will basically import the boundary definitions into your dataset.  Otherwise, you will have to locate suitable boundaries in a kml file and import those boundaries before merging.

This dataset displays acreage and farms for each county in North Carolina and originally came from the Census of Agriculture.  Note that there must be a field in common between your data and the data containing boundary definitions in order to merge.  In addition, merge fields can only be text fields.  FIPS codes uniquely identify counties and are contained in both tables.  Unfortunately, Google didn’t set up their FIPS fields correctly, so a cleaned up North Carolina county file is located here.

Next, click the merge tab.  Copy and paste the URL for the boundaries table and click the “Get” button.  In Figure 6, I merged  my data to the boundary file using the fips field, which is called “fips” in table 1 and “GEO_ID2” in table 2.  A merge will produce a new table, so be sure to name that new table at the bottom.Once complete, styling the map is comparable to point data. First, select “Map” under the “Visualize” tab, and be sure to point the location field at the top left to “geometry” where the boundary definitions are stored. Next, click the configure styles link. Then, select Fill color under the Polygons section.

In Figure 7, I am showing median farm size (in acres) along a gradient.  It’s important to note the lower and upper limits to your data in advance as the program will not automatically sense this.  In this case, median farm size ranges from 10 to 191.  Figure 8 shows the output.

Sharing Maps

As withsimilar online programs, Fusion Tables allows sharing of data and maps through a variety of avenues, from links to embeddable script to email.  The links below point to the two maps produced in this posting.

Durham gun crimes map

North Carolina median farm acreage, 2007

Other Notes

This overview provides only a brief introduction to the mapping capabilities of Fusion Tables.  A broad gallery of applications is located at this site, and it contains a variety of geography types.  Some of these use the Fusion Tables API, which is a nice feature that allows for application development with some programming experience.  As with the other tools reviewed by this blog, non-standard boundaries are generally absent and can be difficult to locate.  For example, a researcher with country-level data from the 1700s may have difficulty finding a country border map from that time.  However, maps are available on Fusion Tables for counties, states, countries, and congressional districts, and additional maps can be found on the Internet.

Adding Colored Regions to Excel Charts

Time series data is easy to display as a line chart, but drawing an interesting story out of the data may be difficult without additional description or clever labeling. One option, however, is to add regions to your time series charts to indicate historical periods or visualization binary data.

Here is an example where a chart of annual U.S. national economic indicators has been enhanced with regions that also indicate contractions in the U.S. business cycle – roughly speaking, economic recessions.

A time series with colored regions in the background, created in Excel.

To create this chart, all of the indicators were averaged by year and, where necessary, adjusted for inflation using a conversion factor. Download the time series data Excel file for the data and the chart to follow along.

First, to set up the basic line chart, hold Ctrl (PC) or Cmd (Mac) while you select the following columns:

  • D (stock price index over CF…)
  • E (avg. annual unemployment…)
  • G (GDP over CF…)
  • I (debt over CF…)
  • K (interest rate * 10)
  • L (years economy is in decline)

Youll notice that the columns are color coded. Some colors apply to multiple columns; this is because the values that appear on the chart have been calculated by transforming the raw data in some way. Each line on the final chart thus corresponds to one or more columns of data used to produce the values. Transforming the values helps us by normalizing the values (i.e., adjusting for inflation) or scaling the data series itself (making it possible to see the relationships between many different indicators on a single graph, despite wide variations in the ranges of values).

When we select the six columns above and insert a line chart, we get a rather ugly line chart.

The chart as it looks with the default Excel settings.

We’ll make several changes to improve this:

  • Change the “years… in decline” series to an area chart
  • Select and adjust the x axis labels and ticks
  • Adjust the y axis range
  • Customize the color, label, and order of the data series

The basic mechanism of the colored regions on the chart is to use Excel’s “area chart” to create rectangular areas. The area chart essentially takes a line chart and fills the area under the line with a color. If we have a continuous horizontal line as a data series, we will create a large colored rectangle on the chart. To have breaks in the rectangle, we simply need to leave some of the years blank (without values in the cells). To select the appropriate values for column L, we first found the maximum value for the other data series and determined that a value of 20 would create bars starting above the other data series.

To produce the colored regions that indicate contractions in the business cycle, we take the series that was created from column L and turn it into an area chart.

  1. Right-click on any data point in the series or on the legend entry
  2. Select “Change Series Chart Type…”
    Changing the chart type for a particular data series.
  3. Select the standard Area chart from the ribbon
    Select Area from the ribbon.

The chart now fills in the area under the original lines with a default fill color.

After changing one series to an Area chart.

At this point, you can right click on the series again, select “Format Data Series…”, and change the Fill color to a light gray.

Changing the fill color of the Area series.

Next, we tell the x axis what the correct labels are (the “Year” column) and have the labels show up every 4 years. (Our data series start on an election year, so the labels will always appear on election years.)

      1. Right click inside the chart somewhere and select “Select Data…”
      2. Select any of the data series in the “Series” list, then go over to the “Category (X) axis labels” box and select the “Year” column. Click “OK”.
        Changing the X axis labels.
      3. Right-click on the x axis and select “Format Axis…”.
      4. Under “Scale”:
        1. Change the default interval between labels from 3 to 4
        2. Change the interval between tick marks to 4 as well
        3. Uncheck the box next to “Vertical axis crosses between categories”
          Changing the X axis scale.
      5. Under “Text Box”, select the text direction of “Rotate to 90 deg Counterclockwise”. Click “OK”.
        Changing the X axis text.

The x axis should have appropriate year labels now. The y axis can similarly be adjusted to show just the range of values we’re most interested in.

      1. Right-click on the axis and select “Format Axis…”.
      2. Under “Scale”, unselect the check box next to “Maximum:” and change the value to 20.
        Changing the Y axis scale.

The rest of the changes are simply formatting changes. Right-click on the individual data series to change the colors, line widths, etc. Use the formatting options or the Chart tools on the Excel ribbon to change the font of any text, adjust the grid lines, add labels and titles, etc. The data series names in the legend can be adjusted by using the “Select Data…” option and typing in custom text in the “Name” field.

The final product should have colored regions and look something like the chart below.

A time series with colored regions in the background, created in Excel.

In another post, we will show how to spice this chart up even more using Adobe Illustrator.