The lab features three distinct areas for supporting data driven research.
Data and Visualization Lab Space
Our lab space features twelve high end workstations with dual monitors with the latest software for data visualization, digital mapping, statistics, and qualitative research. All of the machines have two dedicated displays to encourage collaborative work and data consultations. Additionally, all twelve machines have a dedicated power port located conveniently under the edge of the table for powering a laptop or usb powered device.
Bloomberg Professional “Bar”
Since the launch of our Bloomberg terminals, we have seen a steady increase in both individual and team based usage of Bloomberg financial data. Our three Bloomberg Professional workstations are now located on a dedicated “bar” across from our lab machines. The new Bloomberg zone will facilitate collaborate work and provide a base for groups such as the Duke University Investment Club and Duke Financial Economics Center.
Consult and Collaborative Space
Our third lab space provides a set of four rolling tables for small groups to collaborate or for projects that don’t require a fixed computing space. An 85″ flat panel display near this zone features data visualizations and other data driven research projects at Duke.
Come See Us!
With ample natural light, almost 24/7 availability, and a welcoming staff eager to work with you on your next data driven project. We look forward to working with you in the upcoming year!
Here at Data & GIS Services, we love finding new ways to map things. Earlier this semester I was researching how the Sheets tool in Google Drive could be used as a quick and easy visualization tool when I re-discovered its simple map functionality. While there are plenty of more powerful mapping tools if you want to have a lot of features (e.g., ArcGIS, QGIS, Google Fusion Tables, Google Earth, GeoCommons, Tableau, CartoDB), you might consider just sticking with a spreadsheet for some of your simpler projects.
I’ve created a few examples in a public Google Sheet, so you can see what the data and final maps look like. If you’d like to try creating these maps yourself, you can use this template (you’ll have to log into your Google account first, and then click on the “Use this template” button to get your own copy of the spreadsheet).
Organizing Your Data
The main thing to remember when trying to create any map or chart in a Google sheet is that the tool is very particular about the order of columns. For any map, you will need (exactly) two columns. According to the error message that pops up if your columns are problematic: “The first column should contain location names or addresses. The second column should contain numeric values.”
Of course, I was curious about what counts as “location names” and wanted to test the limits of this GeoMap chart. If you have any experience with the Google Charts API, you might expect the Google Sheet GeoMap chart to work like the Geo Chart offered there. In the spreadsheet, however, you have only a small set of options compared to the charts API. You do have two map options — a “region” (or choropleth) map and a “marker” (or proportional symbol) map — but the choices for color shading and bubble size are built-in or limited.
Region maps (Choropleths)
Region maps are fairly restrictive, because Google needs to know the exact boundary of the country or state that you’re interested in. In a nutshell, a region map can either use country names (or abbreviations) or state names (or abbreviations). The ISO 3166-1 alpha-2 codes seem to work exceptionally well for countries (blazing fast speeds!), but the full country name works well, too. For US states, I also recommend the two letter state abbreviation instead of the full state name. If you ever want to switch the map from “region” to “marker”, the abbreviations are much more specific than the name of the state. (For example, when I switch my “2008 US pres election” map to marker, Washington state turns into a bubble over Washington DC.)
Marker maps (Proportional symbol maps)
Marker maps, on the other hand, allow for much more flexibility. In fact, the marker map in Google Sheets will actually geocode street addresses for you. In general, the marker map will work best if the first column (the location column) includes information that is as specific as possible. As I mentioned before, the word “Washington” will go through a search engine and will get matched to Washington DC before Washington state. Same with New York. But the marker map will basically do the search on any text, so the spreadsheet cell can say “NY”, or “100 State Street, Ithaca, NY”, or even the specific latitude and longitude of a place. (See the “World Capitals with lat/lon” sheet; I just put latitude and longitude in a single column, separated with a comma.) As long as the location information is in a single column, it should work, but the more specific the information is, the better.
When you have your data ready and want to create a map, just select the correct two columns in your spreadsheet, making sure that the first one has appropriate location information and the second one has some kind of numerical data. Then click on the “Insert” menu and go down to “Chart…” You’ll get the chart editor. The first screen will be the “Start” tab, and Google will try to guess what chart you’re trying to use. It probably won’t guess a map on the first try, so just click on the “Charts” tab at the top to manually select a map. Map is one of the lower options on the left hand side, and then you’ll be given a choice between the regions and markers maps. After you select the map, you can either stick with the defaults or go straight to the final tab, “Customize,” to change the colors or to zoom your map into a different region. (NB: As far as I can tell, the only regions that actually work are “World,” “United States,” “Europe,” and “Asia”.)
The default color scale goes from red to white to green. You’ll notice that the maps automatically have a “mid” value for the color. If you’d rather go straight from white to a dark color, just choose something in the middle for the “mid” color.
And there you have it! You can’t change anything beyond the region and the colors, so once you’ve customized those you can click “Update” and check out your map. Don’t like something? Click on the map and a little arrow will appear in the upper right corner. Click there to open the menu, then click on “Advanced edit…” to get back to the chart editor. If you want a bigger version of the map, you can select “Move to own sheet…” from that same menu.
Pros and Cons
So, what are these maps good for? Well, firstly, they’re great if you have state or country data and you want a really quick view of the trends or errors in the data. Maybe you have a country missing and you didn’t even realize it. Maybe one of the values has an extra zero at the end and is much larger than expected. This kind of quick and dirty map might be exactly what you need to do some initial exploration of your data, all while staying in a spreadsheet program.
Another good use of this tool is to make a map where you need to geocode addresses but also have proportional symbols. Google Fusion Tables will geocode addresses for you, but it is best for point maps where all the points are the same size or for density maps that calculate how tightly clusters those points are. If you want the points to be sized (and colored) according to a data variable, this is possibly the easiest geocoder I’ve found. It’ll take a while to search for all of the locations, though, and there is probably an upper limit of a couple of hundred rows.
Explore network analysis, text mining, online mapping, data visualization, and statistics in our spring 2014 workshop series. Our workshops provide a chance to explore new tools or refresh your memory on effective strategies for managing digital research. Interested in keeping up to date with workshops and events in Data and GIS? Subscribe to the dgs-announce listserv or follow us on Twitter (@duke_data).
Data & GIS Services will soon be accepting submissions to its 2nd annual student data visualization contest. If you have a course project that involves visualization, start thinking about your submission now!
The purpose of the contest is to highlight outstanding student data visualization work at Duke University. Data & GIS Services wants to give you a chance to showcase the hard work that goes into your visualization projects.
Data visualization here is broadly defined, encompassing everything from charts and graphs to 3D models to maps to data art. Data visualizations may be part of a larger research project or may be developed specifically to communicate a trend or phenomenon. Some are static images, while others may be animated simulations or interactive web experiences. Browse through last year’s submissions to get an idea of the range of work that counts as visualization.
The Student Data Visualization Contest is sponsored by Data & GIS Services, Perkins Library, Scalable Computing Support Center, Office of Information Technology, and the Office of the Vice Provost for Research.
Analyze, discover, manage, map, and visualize your data with Duke Libraries Data and GIS Services. Our team of five consultants provides a broad range of support in areas ranging from data analysis, data visualization, geographic information systems, financial data, statistical software and data storage and management. Our lab provides 12 workstations with the latest data software and three Bloomberg Professional workstations nearly 24/7 for the Duke community.
Data and GIS Workshop Series
All are welcome to the Data and GIS Workshop Series. Analyze, communicate, clean, map, represent and visualize your data with a wide range of workshops on data based research methods and tools. Details and registration for each class are available at the links that follow. (Interested in keeping up to date with workshops and events in Data and GIS? Just go to https://lists.duke.edu/sympa/info/dgs-announce and click on the “Subscribe” link at the bottom left.)
Data & GIS Services is hiring! We have two open positions for student web programmers interested in working on data visualization projects. See the Library Student employment page (http://library.duke.edu/jobs/students.html) for more information on how to apply. (The job can be found by searching for requisition number “DUL14-AMZ02”.)
New Data and Map Collections
CPS on Web (CPS Utilities Online) CPS on Web is a set of utilities enabling you to access CPS data and documentation from this website. You may make tables and graphs from the CPS data, download data extractions, make estimations, get summaries and statistical measures, search the documentation, and make your own variables as functions of the existing ones.
Global Financial Data Global Financial Data is a collection of financial and economic data provided in ASCII or Excel format. Data includes: long-term historical indices on stock markets; Total Return data on stocks, bonds, and bills; interest rates; exchange rates; inflation rates; bond indices; commodity indices and prices; consumer price indices; gross domestic product; individual stocks; sector indices; treasury bill yields; wholesale price indices; and unemployment rates covering over 200 countries.
The LandScan Global Population Database provides global population distribution in a gridded GIS format at 30 arc-second resolution (approximately 1×1 km cells). Oak Ridge National Laboratory developed modeling techniques to disaggregate and interpolate census data within administrative boundaries to create a GIS layer showing population distribution as accurately and as timely as possible. EastView provides this data to use in GIS software as a WMS (Web Mapping Service) or as a WCS (Web Coverage Service) to allow a user to incorporate population distribution into GIS mapping and analysis.
Tableau is a data visualization software application that allows you to easily create and share interactive charts, graphs, and maps. While the free version of this tool, Tableau Public, has offered wonderful opportunities for generating and publishing data visualizations, there are file size and format limits that make it difficult for some researchers to use the public tool.
Visualizing spatial data can be challenging. Specialized software tools like ArcGIS produce excellent results, but often seem complex for relatively simple tasks. Several online tools have emerged recently that provide relatively easy alternatives for the display of spatial data. In this post, we examine Google Fusion Tables, which combines visualizations, including spatial visualizations, with a database back end. The key advantages to Fusion Tables are easy display of latitude/longitude data or data that is included with address information. In addition, Fusion Tables provides a one-stop location for producing visualizations other than maps, such as line charts or tables.
Uploaded to Fusion Tables is easy through Google Docs. Simply log in if you have an account, create a new Table, and on the next screen, point to the file you wish to upload. Excel and CSV files are the two most commonly used, and KML files allow for upload of maps that contain spatial information, such as locations or polygon definitions.
One thing to note about Google products is that they are often in a state of flux. Limits and restrictions noted below may change in the future. For further information regarding Google Fusion Tables, please consult this Libguide authored by Mark Thomas.
A complete list of geographic data types can be found at the Google support site. In this post, two of the more common geocoding types will be addressed, address data and data that applies to states, counties, and similar objects.
Address data is pretty easy to work with. Addresses should contain as much information as possible with items separated by spaces only, no commas. For example, 134 Chapel Drive Durham NC 27708 should produce a pretty good geocoding result. In the following example, Durham gun crimes for 2011 were downloaded from the Durham Police Department. The data only came with address information, so city and state data were subsequently added and combined in Excel (location field). In Figure 1, highlighted fields indicate spatial information.
To geocode the addresses, select “Map” under the “Visualize” tab. The program will automatically begin geocoding based on the left-most field containing spatial information, which is city in this case. Changing the field to “location,” which contains the full address information, will correctly geocode these addresses (Figure 2).
Once complete, the geocoded points are plotted on a map (Figure 3). As with other mapping applications, you may apply a symbology to the points in order to visualize your data.
This particular dataset contains a numeric field that identifies 5 general types of crime (crime_cat_num). Under the “Configure styles” link at the top, navigate to “Buckets” and divided the data into five buckets (Figure 4).
Once saved, each color will represent a different type of crime and shown in Figure 5 (red indicates robbery, yellow, assault and so on).
Note that only numeric fields can be used to categorize data, so you may wish to create these fields prior to upload.
Working with polygon data can be a bit trickier because the polygons must be spatially defined. Fusion Tables does this by using kml, which is basically a large piece of text containing all of the coordinates, in order, that define a boundary. For example, in this table, each boundary is defined in the geometry field. Google provides a variety of boundary types, which are available here. If your data match one of these existing boundary types, you may upload data and merge it with the correct table, which will basically import the boundary definitions into your dataset. Otherwise, you will have to locate suitable boundaries in a kml file and import those boundaries before merging.
This dataset displays acreage and farms for each county in North Carolina and originally came from the Census of Agriculture. Note that there must be a field in common between your data and the data containing boundary definitions in order to merge. In addition, merge fields can only be text fields. FIPS codes uniquely identify counties and are contained in both tables. Unfortunately, Google didn’t set up their FIPS fields correctly, so a cleaned up North Carolina county file is located here.
Next, click the merge tab. Copy and paste the URL for the boundaries table and click the “Get” button. In Figure 6, I merged my data to the boundary file using the fips field, which is called “fips” in table 1 and “GEO_ID2” in table 2. A merge will produce a new table, so be sure to name that new table at the bottom.Once complete, styling the map is comparable to point data. First, select “Map” under the “Visualize” tab, and be sure to point the location field at the top left to “geometry” where the boundary definitions are stored. Next, click the configure styles link. Then, select Fill color under the Polygons section.
In Figure 7, I am showing median farm size (in acres) along a gradient. It’s important to note the lower and upper limits to your data in advance as the program will not automatically sense this. In this case, median farm size ranges from 10 to 191. Figure 8 shows the output.
As withsimilar online programs, Fusion Tables allows sharing of data and maps through a variety of avenues, from links to embeddable script to email. The links below point to the two maps produced in this posting.
This overview provides only a brief introduction to the mapping capabilities of Fusion Tables. A broad gallery of applications is located at this site, and it contains a variety of geography types. Some of these use the Fusion Tables API, which is a nice feature that allows for application development with some programming experience. As with the other tools reviewed by this blog, non-standard boundaries are generally absent and can be difficult to locate. For example, a researcher with country-level data from the 1700s may have difficulty finding a country border map from that time. However, maps are available on Fusion Tables for counties, states, countries, and congressional districts, and additional maps can be found on the Internet.
ArcGIS Online is a service that allows for storage and sharing of spatial data and maps. In contrast to many other web based GIS services, ArcGIS Online accepts geocoded text-based data and shapefiles, which allows users to share and present work built in ArcGIS Desktop.
Members of the Duke community can register for two different version of ArcOnline. Public access to the service grants access that allows basic file storage and digital mapping. Duke-sponsored access facilitates sharing files within the Duke community and provides a higher threshold for storing and processing ArcGIS files online. Access to the Duke version is available on request for Duke affiliates with a valid Duke email at firstname.lastname@example.org.
Loading and Processing Data
Duke-sponsored access allows users to import and work with text-based data sets containing more than 250 features or shapefiles containing more than 1,000 features online. Data sets that exceed these thresholds must be published as Feature Services, which can done at the time of upload or any point thereafter. A Feature Service is basically an object that can be brought into a map and differs from a file, which can be uploaded for storage, but cannot be imported into a map.
In addition, Duke sponsored access will allow you to use data shared by other members of the Duke community, which expands the data available beyond those data sources shared with the public and by members of groups to which you belong, both available in either version. Users with modest data needs and users that prefer use ArcGIS Desktop to create maps will find the public version suitable in most cases. Users with larger datasets and those that will collaborate and present maps online will find Duke-sponsored access much more helpful.ArcGIS Online provide two entry points for the uploading of data and the production of maps. The first is entered when “My Content” is clicked. This section lists all of the items that have been uploaded and produced by the user. There are two key types of items listed, files and data sources. Files, including text files and shapefiles, are items that can be stored and shared, but are not accessible by the mapping interfaces. By contrast, data sources, the most common of which are Feature Services and Web Maps, can be seen by the mapping interfaces and incorporated into new maps.
Duke sponsored access provides the ability to convert geocoded text files and shapefiles into data sources at the time of upload or any point thereafter. By contrast, the public version does not allow users to create Feature Services, but does allow for the creation of Web Maps data sources from text files containing fewer than 250 features or shapefiles containing fewer than 1,000 features. Data sets exceeding these thresholds will require Duke sponsored access to visualize online.
The second section of ArcGIS Online is the “Map” section, which opens the map viewer, one of the two mapping tools available in ArcGIS Online. The map viewer allows for the creation of Web Maps, which can be shared online and saved as data sources for new maps. Both versions of ArcGIS Online will allow for the upload of data sources directly into the viewer, the inclusion of public and group data to which the user has access, and the inclusion of data sources previously created by the user.
Once saved, the map can be accessed in the “My Content” section and opened in either map viewer or Explorer, the second mapping tool available.Map viewer allows you to add data from files, web services, and allows for the creation of editable layers. Many styling modifications like color classification of features and customization of the attribute popup window are possible. This map displays a customized popup with an added pie chart based on, in this case, a single feature.
Explorer contains the same basic set of features, but it also contains a presentation mode, where slide stills can be taken and arranged for presentation. Again, this map displays a styled popup as well as customized county-level styling based on an attribute.
Sharing Data and Maps
Saved maps can be shared by embeddable script or by link. The map as a data source can also be shared with the public, with members of any groups to which the user belongs, and with the Duke community as a whole (“Duke University and Medical Center –NSOE”).
Online sharing of data and collaboration is a relatively new need that multiple tools are working to fulfill. ArcGIS Online is an excellent option, particularly when online viewing is an important goal. If your online visualization needs are modest, and if you generally prefer to produce maps and edit shapefiles on ArcGIS Desktop, the public version may fulfill your needs. But if the feature restrictions noted above prove prohibitive, Duke sponsored access will provide the flexibility needed for most applications.
Visualize your data, analyze your results, map your statistics, and find the data you need! Come visit us in Perkins 226 (second floor Perkins) for a consultation or contact us online (email: email@example.com or twitter: duke_data OR duke_vis). We look forward to working with you on your next data driven project.
With 12 workstations with dual 24″ monitors and 16 gigs of memory, the new Data and GIS lab is ready to take on the most challenging statistical, mapping, and visualization research projects. The new lab also features a flatbed scanner for projects moving from print to digital data. Lab hours are the same hours as Perkins Library (almost 24/7).
Learn about data management planning. Apply text mining strategies to understand your documents. Visualize your data with Tableau Public, or map your results using ArcGIS or Google Earth Pro. A new series of workshops connects traditional statistical, geospatial, and visualization tools with web based options. Register online for our courses or schedule a session for your course by emailing firstname.lastname@example.org
If you missed last fall’s Bloomberg service – Duke Libraries in pleased to announce the installation of three Bloomberg financial terminals in the Data and GIS Lab in 226 Perkins. The terminals provide the latest news and financial data and include an application that makes it easy to export data to Excel. Access is restricted to all current Duke affiliates. Training on Bloomberg is currently being planned for the last week of September. Please email email@example.com to reserve a space at the training session.
Data and GIS has launched a new guide that provides guidance for researchers looking for advice on data management plans now required by several granting agencies. The guide provides examples of sample plans, key concepts involved in writing a plan, and contact information for groups on campus providing data management advice. In addition, we offer individual consultations with researchers on data management planning.
Visualizing spatial data can be challenging. Specialized software tools like ArcGIS produce excellent results, but often seem complex for relatively simple tasks. Several online tools have emerged recently that provide relatively easy alternatives for the display of spatial data. In this post, we examine GeoCommons, a web based tool for presenting spatial data in detail. (Go to this guide to see a comparison chart of packages and features, and see this Duke University Libguide for a more detailed review of GeoCommons.)
GeoCommons is an online mapping application that easily imports a variety of data formats, including geospatial data, and quickly produces sharable maps. In contrast to other mapping tools, GeoCommons contains several categorization algorithms, such as quantile classification and classification based on the standard deviation of the sample that assist with the construction of informative maps. CSV files and ArcGIS shapefiles are two of the most widely used file formats compatible with GeoCommons.
GeoCommons is very easy to use and contains some of the display features contained in high-end GIS suites. Creation of new variables tied to geographies can be tricky, so it’s advised to either upload data and map in final form or to first identify the layer to which you will upload and join a complete data set.
To begin geocoding, upload a file. Gecommons has the capability to recognize spatially encoded data. Some formats may require user assistance.
If you’ve uploaded data that contains latitude and longitude coordinates, choose this option. In my case, I had county FIPS codes that uniquely identified each county. Selecting US Boundaries to the left, then USA Counties, I was able to successfully preview how well my FIPS codes matched the layer (Figures 1 and 2). A variety of other boundary types are available. The key is to have in your data a unique identifier that identifies each record in the same manner as an available geocoding layer.
Review the geocoding results and select Continue to proceed.
Geocommons offers some nice built in features that assist with categorizing measures. The application will produce summary statistics for numeric fields (Figure 3), which gives you a quick picture of your sample and can assist with how to categorize the data. Click the “Make a Map” button to proceed to the interactive interface.
Also note the filter tab, which allows you to screen out groups of cases. For example, I may request a minimum number of farms to screen out urban counties.
Figure 4 shows a standard choropleth map portraying median number of acres per farm by county for North Carolina in 2007. In this example, I have classified counties into five groups using standard deviations to group counties.
GeoCommons contains a wide variety of ways to share data (accessed through the About section). Posting to Twitter, Facebook, and an array of other social media sites is possible with a few short clicks. You can directly email a link to the map along with a short personal message right out of the application.For those who wish to post to a web page, GeoCommons provides two ways to insert a map, through a <div> tag and through an iframe. All code is generated for copy and paste into your page.
To access a version of this map, simply follow this link.
Finally, GeoCommons will produce a PNG image and a KML document for download. The image export feature appears to be relatively new and does take trial-and-error to align correctly. In addition, it does not appear to include any base layers or legends in the output, only the data layer.
When using standard deviation and maximum breaks methods for grouping observations, double check the category definitions by changing the number of categories and the resulting changes to the definitions for the new groups. This will help to confirm whether data are grouped appropriately and exactly what the definitions for each category are.