The end of the spring semester always brings presentations of final projects, some of which may have been in the works since the fall or even the summer. Tableau, a software application designed specially for visualization, is a great option for projects that would benefit from interactive charts and maps.
Visualizing survey data, however, can be a bit of a pain. If your project uses Qualtrics, for example, you may be having trouble getting the data ready for visualization and analysis. Qualtrics is an extremely powerful survey tool, but the data it creates can be very complicated, and typical data analysis tools aren’t designed to handle that complexity.
Luckily, here at Duke, Tableau users can use Tableau’s Web Data Connector to pull Quatrics data directly into Tableau! It’s so easy, you may never analyze your Qualtrics data another way again.
Click on your account icon in the upper-right corner
Select Account Settings…
On the Account Settings page, click on the Qualtrics IDs tab
Under API, check for a token. If you don’t have one yet, click on Generate Token
Copy your token
In Tableau (Public or Desktop): Paste your API token
Paste in your API token and click the Login button
Select the data fields you would like to import
Note: there is an option to “transpose” some of the fields on import. This is useful for many of the types of visualizations you might want to create from survey data. Typically, you want to transpose fields that represent the questions asked in the survey, but you may not want to transpose demographics data or identifiers. See also the Qualtrics tips on transposing data.
For more tips on how to use Tableau with Qualtrics data, check out the resources below:
Here at Data & GIS Services, we love finding new ways to map things. Earlier this semester I was researching how the Sheets tool in Google Drive could be used as a quick and easy visualization tool when I re-discovered its simple map functionality. While there are plenty of more powerful mapping tools if you want to have a lot of features (e.g., ArcGIS, QGIS, Google Fusion Tables, Google Earth, GeoCommons, Tableau, CartoDB), you might consider just sticking with a spreadsheet for some of your simpler projects.
I’ve created a few examples in a public Google Sheet, so you can see what the data and final maps look like. If you’d like to try creating these maps yourself, you can use this template (you’ll have to log into your Google account first, and then click on the “Use this template” button to get your own copy of the spreadsheet).
Organizing Your Data
The main thing to remember when trying to create any map or chart in a Google sheet is that the tool is very particular about the order of columns. For any map, you will need (exactly) two columns. According to the error message that pops up if your columns are problematic: “The first column should contain location names or addresses. The second column should contain numeric values.”
Of course, I was curious about what counts as “location names” and wanted to test the limits of this GeoMap chart. If you have any experience with the Google Charts API, you might expect the Google Sheet GeoMap chart to work like the Geo Chart offered there. In the spreadsheet, however, you have only a small set of options compared to the charts API. You do have two map options — a “region” (or choropleth) map and a “marker” (or proportional symbol) map — but the choices for color shading and bubble size are built-in or limited.
Region maps (Choropleths)
Region maps are fairly restrictive, because Google needs to know the exact boundary of the country or state that you’re interested in. In a nutshell, a region map can either use country names (or abbreviations) or state names (or abbreviations). The ISO 3166-1 alpha-2 codes seem to work exceptionally well for countries (blazing fast speeds!), but the full country name works well, too. For US states, I also recommend the two letter state abbreviation instead of the full state name. If you ever want to switch the map from “region” to “marker”, the abbreviations are much more specific than the name of the state. (For example, when I switch my “2008 US pres election” map to marker, Washington state turns into a bubble over Washington DC.)
Marker maps (Proportional symbol maps)
Marker maps, on the other hand, allow for much more flexibility. In fact, the marker map in Google Sheets will actually geocode street addresses for you. In general, the marker map will work best if the first column (the location column) includes information that is as specific as possible. As I mentioned before, the word “Washington” will go through a search engine and will get matched to Washington DC before Washington state. Same with New York. But the marker map will basically do the search on any text, so the spreadsheet cell can say “NY”, or “100 State Street, Ithaca, NY”, or even the specific latitude and longitude of a place. (See the “World Capitals with lat/lon” sheet; I just put latitude and longitude in a single column, separated with a comma.) As long as the location information is in a single column, it should work, but the more specific the information is, the better.
When you have your data ready and want to create a map, just select the correct two columns in your spreadsheet, making sure that the first one has appropriate location information and the second one has some kind of numerical data. Then click on the “Insert” menu and go down to “Chart…” You’ll get the chart editor. The first screen will be the “Start” tab, and Google will try to guess what chart you’re trying to use. It probably won’t guess a map on the first try, so just click on the “Charts” tab at the top to manually select a map. Map is one of the lower options on the left hand side, and then you’ll be given a choice between the regions and markers maps. After you select the map, you can either stick with the defaults or go straight to the final tab, “Customize,” to change the colors or to zoom your map into a different region. (NB: As far as I can tell, the only regions that actually work are “World,” “United States,” “Europe,” and “Asia”.)
The default color scale goes from red to white to green. You’ll notice that the maps automatically have a “mid” value for the color. If you’d rather go straight from white to a dark color, just choose something in the middle for the “mid” color.
And there you have it! You can’t change anything beyond the region and the colors, so once you’ve customized those you can click “Update” and check out your map. Don’t like something? Click on the map and a little arrow will appear in the upper right corner. Click there to open the menu, then click on “Advanced edit…” to get back to the chart editor. If you want a bigger version of the map, you can select “Move to own sheet…” from that same menu.
Pros and Cons
So, what are these maps good for? Well, firstly, they’re great if you have state or country data and you want a really quick view of the trends or errors in the data. Maybe you have a country missing and you didn’t even realize it. Maybe one of the values has an extra zero at the end and is much larger than expected. This kind of quick and dirty map might be exactly what you need to do some initial exploration of your data, all while staying in a spreadsheet program.
Another good use of this tool is to make a map where you need to geocode addresses but also have proportional symbols. Google Fusion Tables will geocode addresses for you, but it is best for point maps where all the points are the same size or for density maps that calculate how tightly clusters those points are. If you want the points to be sized (and colored) according to a data variable, this is possibly the easiest geocoder I’ve found. It’ll take a while to search for all of the locations, though, and there is probably an upper limit of a couple of hundred rows.
Visualizing spatial data can be challenging. Specialized software tools like ArcGIS produce excellent results, but often seem complex for relatively simple tasks. Several online tools have emerged recently that provide relatively easy alternatives for the display of spatial data. In this post, we examine Google Fusion Tables, which combines visualizations, including spatial visualizations, with a database back end. The key advantages to Fusion Tables are easy display of latitude/longitude data or data that is included with address information. In addition, Fusion Tables provides a one-stop location for producing visualizations other than maps, such as line charts or tables.
Uploaded to Fusion Tables is easy through Google Docs. Simply log in if you have an account, create a new Table, and on the next screen, point to the file you wish to upload. Excel and CSV files are the two most commonly used, and KML files allow for upload of maps that contain spatial information, such as locations or polygon definitions.
One thing to note about Google products is that they are often in a state of flux. Limits and restrictions noted below may change in the future. For further information regarding Google Fusion Tables, please consult this Libguide authored by Mark Thomas.
A complete list of geographic data types can be found at the Google support site. In this post, two of the more common geocoding types will be addressed, address data and data that applies to states, counties, and similar objects.
Address data is pretty easy to work with. Addresses should contain as much information as possible with items separated by spaces only, no commas. For example, 134 Chapel Drive Durham NC 27708 should produce a pretty good geocoding result. In the following example, Durham gun crimes for 2011 were downloaded from the Durham Police Department. The data only came with address information, so city and state data were subsequently added and combined in Excel (location field). In Figure 1, highlighted fields indicate spatial information.
To geocode the addresses, select “Map” under the “Visualize” tab. The program will automatically begin geocoding based on the left-most field containing spatial information, which is city in this case. Changing the field to “location,” which contains the full address information, will correctly geocode these addresses (Figure 2).
Once complete, the geocoded points are plotted on a map (Figure 3). As with other mapping applications, you may apply a symbology to the points in order to visualize your data.
This particular dataset contains a numeric field that identifies 5 general types of crime (crime_cat_num). Under the “Configure styles” link at the top, navigate to “Buckets” and divided the data into five buckets (Figure 4).
Once saved, each color will represent a different type of crime and shown in Figure 5 (red indicates robbery, yellow, assault and so on).
Note that only numeric fields can be used to categorize data, so you may wish to create these fields prior to upload.
Working with polygon data can be a bit trickier because the polygons must be spatially defined. Fusion Tables does this by using kml, which is basically a large piece of text containing all of the coordinates, in order, that define a boundary. For example, in this table, each boundary is defined in the geometry field. Google provides a variety of boundary types, which are available here. If your data match one of these existing boundary types, you may upload data and merge it with the correct table, which will basically import the boundary definitions into your dataset. Otherwise, you will have to locate suitable boundaries in a kml file and import those boundaries before merging.
This dataset displays acreage and farms for each county in North Carolina and originally came from the Census of Agriculture. Note that there must be a field in common between your data and the data containing boundary definitions in order to merge. In addition, merge fields can only be text fields. FIPS codes uniquely identify counties and are contained in both tables. Unfortunately, Google didn’t set up their FIPS fields correctly, so a cleaned up North Carolina county file is located here.
Next, click the merge tab. Copy and paste the URL for the boundaries table and click the “Get” button. In Figure 6, I merged my data to the boundary file using the fips field, which is called “fips” in table 1 and “GEO_ID2” in table 2. A merge will produce a new table, so be sure to name that new table at the bottom.Once complete, styling the map is comparable to point data. First, select “Map” under the “Visualize” tab, and be sure to point the location field at the top left to “geometry” where the boundary definitions are stored. Next, click the configure styles link. Then, select Fill color under the Polygons section.
In Figure 7, I am showing median farm size (in acres) along a gradient. It’s important to note the lower and upper limits to your data in advance as the program will not automatically sense this. In this case, median farm size ranges from 10 to 191. Figure 8 shows the output.
As withsimilar online programs, Fusion Tables allows sharing of data and maps through a variety of avenues, from links to embeddable script to email. The links below point to the two maps produced in this posting.
This overview provides only a brief introduction to the mapping capabilities of Fusion Tables. A broad gallery of applications is located at this site, and it contains a variety of geography types. Some of these use the Fusion Tables API, which is a nice feature that allows for application development with some programming experience. As with the other tools reviewed by this blog, non-standard boundaries are generally absent and can be difficult to locate. For example, a researcher with country-level data from the 1700s may have difficulty finding a country border map from that time. However, maps are available on Fusion Tables for counties, states, countries, and congressional districts, and additional maps can be found on the Internet.
Time series data is easy to display as a line chart, but drawing an interesting story out of the data may be difficult without additional description or clever labeling. One option, however, is to add regions to your time series charts to indicate historical periods or visualization binary data.
Here is an example where a chart of annual U.S. national economic indicators has been enhanced with regions that also indicate contractions in the U.S. business cycle – roughly speaking, economic recessions.
To create this chart, all of the indicators were averaged by year and, where necessary, adjusted for inflation using a conversion factor. Download the time series data Excel file for the data and the chart to follow along.
First, to set up the basic line chart, hold Ctrl (PC) or Cmd (Mac) while you select the following columns:
D (stock price index over CF…)
E (avg. annual unemployment…)
G (GDP over CF…)
I (debt over CF…)
K (interest rate * 10)
L (years economy is in decline)
Youll notice that the columns are color coded. Some colors apply to multiple columns; this is because the values that appear on the chart have been calculated by transforming the raw data in some way. Each line on the final chart thus corresponds to one or more columns of data used to produce the values. Transforming the values helps us by normalizing the values (i.e., adjusting for inflation) or scaling the data series itself (making it possible to see the relationships between many different indicators on a single graph, despite wide variations in the ranges of values).
When we select the six columns above and insert a line chart, we get a rather ugly line chart.
We’ll make several changes to improve this:
Change the “years… in decline” series to an area chart
Select and adjust the x axis labels and ticks
Adjust the y axis range
Customize the color, label, and order of the data series
The basic mechanism of the colored regions on the chart is to use Excel’s “area chart” to create rectangular areas. The area chart essentially takes a line chart and fills the area under the line with a color. If we have a continuous horizontal line as a data series, we will create a large colored rectangle on the chart. To have breaks in the rectangle, we simply need to leave some of the years blank (without values in the cells). To select the appropriate values for column L, we first found the maximum value for the other data series and determined that a value of 20 would create bars starting above the other data series.
To produce the colored regions that indicate contractions in the business cycle, we take the series that was created from column L and turn it into an area chart.
Right-click on any data point in the series or on the legend entry
Select “Change Series Chart Type…”
Select the standard Area chart from the ribbon
The chart now fills in the area under the original lines with a default fill color.
At this point, you can right click on the series again, select “Format Data Series…”, and change the Fill color to a light gray.
Next, we tell the x axis what the correct labels are (the “Year” column) and have the labels show up every 4 years. (Our data series start on an election year, so the labels will always appear on election years.)
Right click inside the chart somewhere and select “Select Data…”
Select any of the data series in the “Series” list, then go over to the “Category (X) axis labels” box and select the “Year” column. Click “OK”.
Right-click on the x axis and select “Format Axis…”.
Change the default interval between labels from 3 to 4
Change the interval between tick marks to 4 as well
Uncheck the box next to “Vertical axis crosses between categories”
Under “Text Box”, select the text direction of “Rotate to 90 deg Counterclockwise”. Click “OK”.
The x axis should have appropriate year labels now. The y axis can similarly be adjusted to show just the range of values we’re most interested in.
Right-click on the axis and select “Format Axis…”.
Under “Scale”, unselect the check box next to “Maximum:” and change the value to 20.
The rest of the changes are simply formatting changes. Right-click on the individual data series to change the colors, line widths, etc. Use the formatting options or the Chart tools on the Excel ribbon to change the font of any text, adjust the grid lines, add labels and titles, etc. The data series names in the legend can be adjusted by using the “Select Data…” option and typing in custom text in the “Name” field.
The final product should have colored regions and look something like the chart below.
In another post, we will show how to spice this chart up even more using Adobe Illustrator.