Explore network analysis, text mining, online mapping, data visualization, and statistics in our spring 2014 workshop series. Our workshops provide a chance to explore new tools or refresh your memory on effective strategies for managing digital research. Interested in keeping up to date with workshops and events in Data and GIS? Subscribe to the dgs-announce listserv or follow us on Twitter (@duke_data).
Data & GIS Services will soon be accepting submissions to its 2nd annual student data visualization contest. If you have a course project that involves visualization, start thinking about your submission now!
The purpose of the contest is to highlight outstanding student data visualization work at Duke University. Data & GIS Services wants to give you a chance to showcase the hard work that goes into your visualization projects.
Data visualization here is broadly defined, encompassing everything from charts and graphs to 3D models to maps to data art. Data visualizations may be part of a larger research project or may be developed specifically to communicate a trend or phenomenon. Some are static images, while others may be animated simulations or interactive web experiences. Browse through last year’s submissions to get an idea of the range of work that counts as visualization.
The Student Data Visualization Contest is sponsored by Data & GIS Services, Perkins Library, Scalable Computing Support Center, Office of Information Technology, and the Office of the Vice Provost for Research.
Analyze, discover, manage, map, and visualize your data with Duke Libraries Data and GIS Services. Our team of five consultants provides a broad range of support in areas ranging from data analysis, data visualization, geographic information systems, financial data, statistical software and data storage and management. Our lab provides 12 workstations with the latest data software and three Bloomberg Professional workstations nearly 24/7 for the Duke community.
Data and GIS Workshop Series
All are welcome to the Data and GIS Workshop Series. Analyze, communicate, clean, map, represent and visualize your data with a wide range of workshops on data based research methods and tools. Details and registration for each class are available at the links that follow. (Interested in keeping up to date with workshops and events in Data and GIS? Just go to https://lists.duke.edu/sympa/info/dgs-announce and click on the “Subscribe” link at the bottom left.)
Data & GIS Services is hiring! We have two open positions for student web programmers interested in working on data visualization projects. See the Library Student employment page (http://library.duke.edu/jobs/students.html) for more information on how to apply. (The job can be found by searching for requisition number “DUL14-AMZ02”.)
New Data and Map Collections
CPS on Web (CPS Utilities Online) CPS on Web is a set of utilities enabling you to access CPS data and documentation from this website. You may make tables and graphs from the CPS data, download data extractions, make estimations, get summaries and statistical measures, search the documentation, and make your own variables as functions of the existing ones.
Global Financial Data Global Financial Data is a collection of financial and economic data provided in ASCII or Excel format. Data includes: long-term historical indices on stock markets; Total Return data on stocks, bonds, and bills; interest rates; exchange rates; inflation rates; bond indices; commodity indices and prices; consumer price indices; gross domestic product; individual stocks; sector indices; treasury bill yields; wholesale price indices; and unemployment rates covering over 200 countries.
The LandScan Global Population Database provides global population distribution in a gridded GIS format at 30 arc-second resolution (approximately 1×1 km cells). Oak Ridge National Laboratory developed modeling techniques to disaggregate and interpolate census data within administrative boundaries to create a GIS layer showing population distribution as accurately and as timely as possible. EastView provides this data to use in GIS software as a WMS (Web Mapping Service) or as a WCS (Web Coverage Service) to allow a user to incorporate population distribution into GIS mapping and analysis.
Tableau is a data visualization software application that allows you to easily create and share interactive charts, graphs, and maps. While the free version of this tool, Tableau Public, has offered wonderful opportunities for generating and publishing data visualizations, there are file size and format limits that make it difficult for some researchers to use the public tool.
Visualizing spatial data can be challenging. Specialized software tools like ArcGIS produce excellent results, but often seem complex for relatively simple tasks. Several online tools have emerged recently that provide relatively easy alternatives for the display of spatial data. In this post, we examine Google Fusion Tables, which combines visualizations, including spatial visualizations, with a database back end. The key advantages to Fusion Tables are easy display of latitude/longitude data or data that is included with address information. In addition, Fusion Tables provides a one-stop location for producing visualizations other than maps, such as line charts or tables.
Uploaded to Fusion Tables is easy through Google Docs. Simply log in if you have an account, create a new Table, and on the next screen, point to the file you wish to upload. Excel and CSV files are the two most commonly used, and KML files allow for upload of maps that contain spatial information, such as locations or polygon definitions.
One thing to note about Google products is that they are often in a state of flux. Limits and restrictions noted below may change in the future. For further information regarding Google Fusion Tables, please consult this Libguide authored by Mark Thomas.
A complete list of geographic data types can be found at the Google support site. In this post, two of the more common geocoding types will be addressed, address data and data that applies to states, counties, and similar objects.
Address data is pretty easy to work with. Addresses should contain as much information as possible with items separated by spaces only, no commas. For example, 134 Chapel Drive Durham NC 27708 should produce a pretty good geocoding result. In the following example, Durham gun crimes for 2011 were downloaded from the Durham Police Department. The data only came with address information, so city and state data were subsequently added and combined in Excel (location field). In Figure 1, highlighted fields indicate spatial information.
To geocode the addresses, select “Map” under the “Visualize” tab. The program will automatically begin geocoding based on the left-most field containing spatial information, which is city in this case. Changing the field to “location,” which contains the full address information, will correctly geocode these addresses (Figure 2).
Once complete, the geocoded points are plotted on a map (Figure 3). As with other mapping applications, you may apply a symbology to the points in order to visualize your data.
This particular dataset contains a numeric field that identifies 5 general types of crime (crime_cat_num). Under the “Configure styles” link at the top, navigate to “Buckets” and divided the data into five buckets (Figure 4).
Once saved, each color will represent a different type of crime and shown in Figure 5 (red indicates robbery, yellow, assault and so on).
Note that only numeric fields can be used to categorize data, so you may wish to create these fields prior to upload.
Working with polygon data can be a bit trickier because the polygons must be spatially defined. Fusion Tables does this by using kml, which is basically a large piece of text containing all of the coordinates, in order, that define a boundary. For example, in this table, each boundary is defined in the geometry field. Google provides a variety of boundary types, which are available here. If your data match one of these existing boundary types, you may upload data and merge it with the correct table, which will basically import the boundary definitions into your dataset. Otherwise, you will have to locate suitable boundaries in a kml file and import those boundaries before merging.
This dataset displays acreage and farms for each county in North Carolina and originally came from the Census of Agriculture. Note that there must be a field in common between your data and the data containing boundary definitions in order to merge. In addition, merge fields can only be text fields. FIPS codes uniquely identify counties and are contained in both tables. Unfortunately, Google didn’t set up their FIPS fields correctly, so a cleaned up North Carolina county file is located here.
Next, click the merge tab. Copy and paste the URL for the boundaries table and click the “Get” button. In Figure 6, I merged my data to the boundary file using the fips field, which is called “fips” in table 1 and “GEO_ID2” in table 2. A merge will produce a new table, so be sure to name that new table at the bottom.Once complete, styling the map is comparable to point data. First, select “Map” under the “Visualize” tab, and be sure to point the location field at the top left to “geometry” where the boundary definitions are stored. Next, click the configure styles link. Then, select Fill color under the Polygons section.
In Figure 7, I am showing median farm size (in acres) along a gradient. It’s important to note the lower and upper limits to your data in advance as the program will not automatically sense this. In this case, median farm size ranges from 10 to 191. Figure 8 shows the output.
As withsimilar online programs, Fusion Tables allows sharing of data and maps through a variety of avenues, from links to embeddable script to email. The links below point to the two maps produced in this posting.
This overview provides only a brief introduction to the mapping capabilities of Fusion Tables. A broad gallery of applications is located at this site, and it contains a variety of geography types. Some of these use the Fusion Tables API, which is a nice feature that allows for application development with some programming experience. As with the other tools reviewed by this blog, non-standard boundaries are generally absent and can be difficult to locate. For example, a researcher with country-level data from the 1700s may have difficulty finding a country border map from that time. However, maps are available on Fusion Tables for counties, states, countries, and congressional districts, and additional maps can be found on the Internet.
ArcGIS Online is a service that allows for storage and sharing of spatial data and maps. In contrast to many other web based GIS services, ArcGIS Online accepts geocoded text-based data and shapefiles, which allows users to share and present work built in ArcGIS Desktop.
Members of the Duke community can register for two different version of ArcOnline. Public access to the service grants access that allows basic file storage and digital mapping. Duke-sponsored access facilitates sharing files within the Duke community and provides a higher threshold for storing and processing ArcGIS files online. Access to the Duke version is available on request for Duke affiliates with a valid Duke email at firstname.lastname@example.org.
Loading and Processing Data
Duke-sponsored access allows users to import and work with text-based data sets containing more than 250 features or shapefiles containing more than 1,000 features online. Data sets that exceed these thresholds must be published as Feature Services, which can done at the time of upload or any point thereafter. A Feature Service is basically an object that can be brought into a map and differs from a file, which can be uploaded for storage, but cannot be imported into a map.
In addition, Duke sponsored access will allow you to use data shared by other members of the Duke community, which expands the data available beyond those data sources shared with the public and by members of groups to which you belong, both available in either version. Users with modest data needs and users that prefer use ArcGIS Desktop to create maps will find the public version suitable in most cases. Users with larger datasets and those that will collaborate and present maps online will find Duke-sponsored access much more helpful.ArcGIS Online provide two entry points for the uploading of data and the production of maps. The first is entered when “My Content” is clicked. This section lists all of the items that have been uploaded and produced by the user. There are two key types of items listed, files and data sources. Files, including text files and shapefiles, are items that can be stored and shared, but are not accessible by the mapping interfaces. By contrast, data sources, the most common of which are Feature Services and Web Maps, can be seen by the mapping interfaces and incorporated into new maps.
Duke sponsored access provides the ability to convert geocoded text files and shapefiles into data sources at the time of upload or any point thereafter. By contrast, the public version does not allow users to create Feature Services, but does allow for the creation of Web Maps data sources from text files containing fewer than 250 features or shapefiles containing fewer than 1,000 features. Data sets exceeding these thresholds will require Duke sponsored access to visualize online.
The second section of ArcGIS Online is the “Map” section, which opens the map viewer, one of the two mapping tools available in ArcGIS Online. The map viewer allows for the creation of Web Maps, which can be shared online and saved as data sources for new maps. Both versions of ArcGIS Online will allow for the upload of data sources directly into the viewer, the inclusion of public and group data to which the user has access, and the inclusion of data sources previously created by the user.
Once saved, the map can be accessed in the “My Content” section and opened in either map viewer or Explorer, the second mapping tool available.Map viewer allows you to add data from files, web services, and allows for the creation of editable layers. Many styling modifications like color classification of features and customization of the attribute popup window are possible. This map displays a customized popup with an added pie chart based on, in this case, a single feature.
Explorer contains the same basic set of features, but it also contains a presentation mode, where slide stills can be taken and arranged for presentation. Again, this map displays a styled popup as well as customized county-level styling based on an attribute.
Sharing Data and Maps
Saved maps can be shared by embeddable script or by link. The map as a data source can also be shared with the public, with members of any groups to which the user belongs, and with the Duke community as a whole (“Duke University and Medical Center –NSOE”).
Online sharing of data and collaboration is a relatively new need that multiple tools are working to fulfill. ArcGIS Online is an excellent option, particularly when online viewing is an important goal. If your online visualization needs are modest, and if you generally prefer to produce maps and edit shapefiles on ArcGIS Desktop, the public version may fulfill your needs. But if the feature restrictions noted above prove prohibitive, Duke sponsored access will provide the flexibility needed for most applications.
Visualize your data, analyze your results, map your statistics, and find the data you need! Come visit us in Perkins 226 (second floor Perkins) for a consultation or contact us online (email: email@example.com or twitter: duke_data OR duke_vis). We look forward to working with you on your next data driven project.
With 12 workstations with dual 24″ monitors and 16 gigs of memory, the new Data and GIS lab is ready to take on the most challenging statistical, mapping, and visualization research projects. The new lab also features a flatbed scanner for projects moving from print to digital data. Lab hours are the same hours as Perkins Library (almost 24/7).
Learn about data management planning. Apply text mining strategies to understand your documents. Visualize your data with Tableau Public, or map your results using ArcGIS or Google Earth Pro. A new series of workshops connects traditional statistical, geospatial, and visualization tools with web based options. Register online for our courses or schedule a session for your course by emailing firstname.lastname@example.org
If you missed last fall’s Bloomberg service – Duke Libraries in pleased to announce the installation of three Bloomberg financial terminals in the Data and GIS Lab in 226 Perkins. The terminals provide the latest news and financial data and include an application that makes it easy to export data to Excel. Access is restricted to all current Duke affiliates. Training on Bloomberg is currently being planned for the last week of September. Please email email@example.com to reserve a space at the training session.
Data and GIS has launched a new guide that provides guidance for researchers looking for advice on data management plans now required by several granting agencies. The guide provides examples of sample plans, key concepts involved in writing a plan, and contact information for groups on campus providing data management advice. In addition, we offer individual consultations with researchers on data management planning.
Visualizing spatial data can be challenging. Specialized software tools like ArcGIS produce excellent results, but often seem complex for relatively simple tasks. Several online tools have emerged recently that provide relatively easy alternatives for the display of spatial data. In this post, we examine GeoCommons, a web based tool for presenting spatial data in detail. (Go to this guide to see a comparison chart of packages and features, and see this Duke University Libguide for a more detailed review of GeoCommons.)
GeoCommons is an online mapping application that easily imports a variety of data formats, including geospatial data, and quickly produces sharable maps. In contrast to other mapping tools, GeoCommons contains several categorization algorithms, such as quantile classification and classification based on the standard deviation of the sample that assist with the construction of informative maps. CSV files and ArcGIS shapefiles are two of the most widely used file formats compatible with GeoCommons.
GeoCommons is very easy to use and contains some of the display features contained in high-end GIS suites. Creation of new variables tied to geographies can be tricky, so it’s advised to either upload data and map in final form or to first identify the layer to which you will upload and join a complete data set.
To begin geocoding, upload a file. Gecommons has the capability to recognize spatially encoded data. Some formats may require user assistance.
If you’ve uploaded data that contains latitude and longitude coordinates, choose this option. In my case, I had county FIPS codes that uniquely identified each county. Selecting US Boundaries to the left, then USA Counties, I was able to successfully preview how well my FIPS codes matched the layer (Figures 1 and 2). A variety of other boundary types are available. The key is to have in your data a unique identifier that identifies each record in the same manner as an available geocoding layer.
Review the geocoding results and select Continue to proceed.
Geocommons offers some nice built in features that assist with categorizing measures. The application will produce summary statistics for numeric fields (Figure 3), which gives you a quick picture of your sample and can assist with how to categorize the data. Click the “Make a Map” button to proceed to the interactive interface.
Also note the filter tab, which allows you to screen out groups of cases. For example, I may request a minimum number of farms to screen out urban counties.
Figure 4 shows a standard choropleth map portraying median number of acres per farm by county for North Carolina in 2007. In this example, I have classified counties into five groups using standard deviations to group counties.
GeoCommons contains a wide variety of ways to share data (accessed through the About section). Posting to Twitter, Facebook, and an array of other social media sites is possible with a few short clicks. You can directly email a link to the map along with a short personal message right out of the application.For those who wish to post to a web page, GeoCommons provides two ways to insert a map, through a <div> tag and through an iframe. All code is generated for copy and paste into your page.
To access a version of this map, simply follow this link.
Finally, GeoCommons will produce a PNG image and a KML document for download. The image export feature appears to be relatively new and does take trial-and-error to align correctly. In addition, it does not appear to include any base layers or legends in the output, only the data layer.
When using standard deviation and maximum breaks methods for grouping observations, double check the category definitions by changing the number of categories and the resulting changes to the definitions for the new groups. This will help to confirm whether data are grouped appropriately and exactly what the definitions for each category are.
Visualizing spatial data can be challenging to learn. Specialized software tools like ArcGIS produce excellent results, but often seem complex for relatively simple tasks. Several online tools have emerged recently and provide relatively easy alternatives for the display of spatial data. In this ongoing series of alternatives, we review Tableau Public 7 in detail. Go to this guide to see a comparison chart of packages and features, and see this Duke University Libguide for a more detailed review of Tableau 6.1.
Tableau Public is a free software application that allows you to easily map data and share maps through email or web pages by embeddable script. To use Tableau, you must download and install a free desktop application. Tableau Public also requires a free registration to share visualizations created in the software.
Tableau is designed to look and feel like a standard spreadsheet application. Geographic mapping is accomplished by dragging your coordinate fields and dropping them into the columns and rows fields (see Figure 1). In Tableau 7, you may also select “Filled Map” under the Marks panel, and select a geographic identifier for the “Level of Detail” field (see Figure 2). Once done, add the variable to color by to the color field. In these examples, more intense colors indicate larger median farm size, measured in acres.
Tableau generates new fields that hold coordinate data as it imports and geocodes your data. If you wish to create filled maps (states, counties, etc.) in Tableau 7, you must additionally have geographic identifiers that are unique for each case. In Figure 2, the initial map only contained 50 polygons, as 50 North Carolina counties were uniquely named within the United States.
Had I also included a state field, unique identification would have been automatic, but Tableau allowed me to define the state for each case, and lucky for me, I only had North Carolina data.
The geocoding options are extensive. The following list is not exhaustive: area codes, FIPS codes, county/state/country names, ZIP codes, and ISO country codes. Of course, any coordinate data will work for point data.
Tableau is very easy to use, provided your data is reasonably clean. With geographic data, be certain to either have something that uniquely identifies each entity or have latitudes and longitudes. It is preferable to err on the side of including more identification fields rather than less (i.e. including state names in addition to counties).
Also be aware that Tableau is not backward-compatible. For example, the workbook used in this example was initially created in Tableau 6.1, modified in Tableau 7, but failed to open once I moved back to Tableau 6.1. However, irrespective of version, you will be able to see any visualizations produced in any version.
The Census Bureau’s American Community Survey provides a continuous measure of the community demographics in the US. A new extension provided by the Department of Geography and Geoinformation Science at Geroge Mason University enhances the mapping of ACS by data by allowing researchers to visualize both survey estimates while revealing the level of uncertainty in the estimates. ACS Mapping Extensions is an ArcGIS addon available for both ArcGIS 9.3 and 10. This post provides a brief overview of installation, setup, and use. Detailed technical assistance is provided by the extension.
1) Once you download the program, you will want to install and note the installation directory. In ArcGIS, select Customize from the menu bar, and click Customize Mode…. Then select “Add from file…” and navigate to the installation directory. Once in this directory, select the “ACSMapping.tlb” file.
2) Before you leave the Customize window, be sure to check the “ACS Mapping Tools” toolbar. You will have a new “ACS Mapping” toolbar added to your window.
1) The “Documentation” option in the “ACS Mapping” toolbar provides detailed instructions for downloading ACS data and boundary files. Follow these instructions to the letter and to their entirety. With respect to boundary files, the TIGER 2008 county boundaries were used for this example.
2) Add the boundary layer to a blank map and select “Join ACS Table(s) with Shapefiles” option in the “ACS Mapping” toolbar. In this example, I have downloaded county boundaries and county-level median income data from the 2005-09 ACS. In this figure, the first two fields indicate the items to be joined, one table to one shapefile. “CNTYIDFP” represents the FIPS code in the boundary file, and “GEO_ID2” is the corresponding code in the ACS table. Once you’ve set an output location, select “OK.”
3) Finally, you will want to apply a symbology to the layer. In this case, I chose the median income estimate and 5 total categories. The following figure shows what my map looks like at this point.
Mapping ACS Estimates with Coefficients of Variation
1) The tools are located under the “Mapping Data Uncertainty” option in the ACS Mapping toolbar. The first option, “Overlay CVs with Estimates,” will allow you to visualize the uncertainty of estimates at the same time as the estimates themselves. As noted by the documetation provided by the ACS Mapping Extension web site, ACS provides a margin of error that produces a confidence level of 90%. This tool will convert these data into coefficients of variation that will allow you to assess the quality of the estimates.
2) Select the target layer to whcih you added symbology, select the variable that stores the estimate to be calculated, and finally, select the variable that stores the margin of error (suffix = “_M”).
3) After you click the “Select” button, you will be presented with the new Symbology options for the new coefficients of variation layer to be generated. In this case, I retained the automatic selections and hit “OK.”
4) Zooming in to central North Carolina, one can see not only that the Research Triangle Area has relatively high incomes compared with much of North Carolina, but that coefficients of variation are lower than thay are for parts of northern North Carolina and southern Virginia.
Measuring Singificant Differences in Income 1) The second option, “Identify Areas of Significant Differences,” allows you to assess whether there is a significant difference between one spatial unit and all other spatial units for a given variable. In order for this option to work, you must select one specific spatial unit. In this example, I selected Durham County and will assess whether there are significant differences in median household income in the region.
2) First, select the target layer for which you selected a single feature. You want to verify the estimates and margin of error variables, and you can adjust the confidence level from the default 90%. Select OK.
3) The output is represented by four different symbologies. First, your chosen county is filled with dots. All counties that are significantly different are striped, while all those that are not are empty. Finally, when significance cannot be determined, the original color fill is replaced with a new color. In this case, median household income is not significantly different between Durham and Chatham counties. However, this could be due to small differences or large margins of error in one or both counties.