The questions asked in the U.S. Census have changed over time to reflect both the data collecting needs of federal agencies and evolving societal norms. Census geographies have also evolved in this time period to reflect population change and shifting administrative boundaries in the United States.
Attempts to Provide Standardized Data
For the researcher who needs to compare demographic and socioeconomic data over time, this variability in data and geography can be problematic. Various data providers have attempted to harmonize questions and to generate standard geographies using algorithms that allow for comparisons over time. Some of the projects mentioned in this post have used sophisticated weighting techniques to make more accurate estimates. See, for instance, some of the NHGIS documentation on standardizing data from 1990 and from 2000 to 2010 geography.
The NHGIS Time Series Tables link census summary statistics across time and may require two types of integration: attribute integration, ensuring that the measured characteristics in a time series are comparable across time, and geographic integration, ensuring that the areas summarized by time series are comparable across time.
For attribute integration, NHGIS often uses “nominally integrated tables,” where the aggregated data is presented as it was compiled. For instance, comparing “Durham County” data from 1960 and 2000 based on the common name of the county.
For geographically standardized tables, when data from one year is aggregated to geographic areas from another year, NHGIS provides documentation with details on the weighting algorithms they use:
NHGIS has resolved discrepancies in the electronic boundary files, as they illustrate here (an area of Cincinnati).
The Social Explorer Comparability Data is similar to the NHGIS Time Series Tables, but with more of a drill-down consumer interface. (Go to Tables and scroll down to the Comparability Data.) Only 2000 to 2010 data are available at the state, county, and census tract level. It provides data reallocated from the 2000 U.S. decennial census to the 2010 geographies, so you can get the earlier data in 2010 geographies for better comparison with 2010 data.
The Longitudinal Tract Database (LTDB) developed at Brown University provides normalized boundaries at the census tract level for 1970-2010. Question coverage over time varies. The documentation for the project are available online:
NC State has translated this data into ArcGIS geodatabase format. They provide a README file, a codebook, and the geodatabase available for download.
If you need to normalize data that isn’t yet available this way, GIS software may be able to help. Using intersection and re-combining techniques, this software may be able to generate estimates of older data in more recent geographies. In ArcGIS, this involves setting the ratio policy when creating a feature layer, to allow apportioning numeric values in attributes among the various overlapping geographies. This involves an assumption of an even geographic distribution of the variable across the entire area (which is not as sophisticated as some of the algorithms used by groups such as NHGIS).
Another research strategy employs crosswalks to harmonize census data over time. Crosswalks are tables that let you proportionally assign data from one year to another or to re-aggregate from one type of geography to another. Some of these are provided by the NHGIS geographic crosswalk files, the Census Bureau’s geographic relationship files, and the Geocorr utility from the Missouri Census Data Center.
One of the attractive features of Tableau for visualization is that it can produce maps in addition to standard charts and graphs. While Tableau is far from being a full-fledged GIS application, it continues to expand its mapping capabilities, making it a useful option to show where something is located or to show how indicators are spatially distributed.
Here, we’re going to go over a few of the Tableau’s mapping capabilities. We’ve recorded a workshop with examples relating to this blog post’s discussion:
Tableau is a visualization tool: Tableau can quickly and effectively visualize your data, but it will not do specialized statistical or spatial analysis.
Tableau makes it easy to import data: A big advantage of Tableau is the simplicity of tasks such as changing variable definitions between numeric, string, and date, or filtering out unneeded columns. You can easily do this at the time you connect to the data (“connect” is Tableau’s term for importing data into the program).
Tableau is quite limited for displaying multiple data layers: Tableau wants to display one layer, so you need to use join techniques to connect multiple tables or layers together. You can join data tables based on common attribute values, but to overlay two geographic layers (stack them), you must spatially join one layer to one other layer based on their common location.
Tableau uses a concept that it calls a “dual-axis” map to allow two indicators to display on the same map or to overlay two spatial layers. If, however, you do need to overlay a lot of data on the same map, consider using proper GIS software.
Displaying paths on a map requires a special data structure: In order for tabular data with coordinate values (latitude/longitude) to display as lines on a map, you need to include a field that indicates drawing order. Tableau constructs the lines like connect-the-dots, each row of data being a dot, and the drawing order indicating how the dots are connected.
You might use this, for instance, with hurricane tracking data, each row representing measurements and location collected sequentially at different times. The illustration above shows Paris metro lines with the station symbol diameter indicating passenger volume. See how to do this in Tableau’s tutorial.
You can take advantage of Tableau’s built-in geographies: Tableau has many built-in geographies (e.g., counties, states, countries), making it easy to plot tabular data that has an attribute with values for these geographic locations, even if you don’t have latitude/longitude coordinates or geographic files — Tableau will look up the places for you! (It won’t, however, look up addresses.)
Tableau also has several built-in base maps available for your background.
Tableau uses the “Web Mercator” projection: This is the same as Google Earth/Maps. Small-scale maps (i.e., large area of coverage) may look stretched out in an unattractive way since it greatly exaggerates the size of areas near the poles.
Useful Mapping Capabilities
Plot points: Tableau works really well for plotting coordinate data (Longitude (X) and Latitude (Y) values) as points. The coordinates must have values in decimal degrees with negative longitudes being east of Greenwich and negative latitudes being south of the equator.
Time slider: If you move a categorical “Dimension” variable onto Tableau’s Pages Card, you can get a value-based slider to filter your data by that variable’s values (date, for instance, as in Google Earth). This is shown in the image above.
Heatmap of point distribution: You can choose Tableau’s “Density” option on its Marks card to create a heatmap, which may display the concentration of your data locations in a smoother manner.
Filter a map’s features: Tableau’s Filter card is akin to ArcGIS’s Definition Query, to allow you to look at just a subset of the features in a data table.
Shade polygons to reflect attribute values: Choropleth maps (polygons shaded to represent values of a variable) are easy to make in Tableau. Generally, you’ll have a field with values that match a built-in geography, like countries of the world or US counties. But you can also connect to spatial files (e.g., Esri shapefiles or GeoJSON files), which is especially helpful if the geography isn’t built into Tableau (US Census Tracts are an example).
Display multiple indicators: Visualizing two variables on the same map is always problematic because the data patterns often get hidden in the confusion, but it is possible in Tableau. Use the “dual-axis” map concept mentioned above. An example might be pies for one categorical variable (with slices representing the categories) on top of choropleth polygons that visualize a continuous numeric variable.
Draw lines from tabular data: Tableau can display lines if your data is structured right, as discussed and illustrated previously, with a field for drawing order. You could also connect to a spatial line file, such as a shapefile or a GeoJSON file.
We’ve just given an overview of some of Tableau’s capabilities regarding spatial data. The developers are adding features in this area all the time, so stay tuned!
With the launch of the Duke University Energy Intiative (EI) several years ago, the Center for Data and Visualization Sciences (CDVS) has seen an increased demand for all sorts of data and information related to energy generation, distribution, and pricing. The EI is a university-wide, interdisciplinary hub that advances an accessible, affordable, reliable, and clean energy system. It involves researchers and students from the Pratt School of Engineering, the Nicholas School of the Environment, the Sanford School of Public Policy, the Duke School of Law, the Fuqua School of Business, and departments in the Trinity College of Arts & Sciences.
The EI website provides links to energy-related data sources, particularly datasets that have proven useful in Duke energy research projects. We will discuss below some more key sources for finding energy-related data.
Energy resources and potentials
The sources for locating energy data will vary depending on the type of energy and the spot on the source-to-consumption continuum that interests you.
The US Department of Energy’s (DoE’s) Energy Information Administration (EIA) has a nice outline of energy sources, with explanations of each, in their Energy Explainedweb pages. These include nonrenewable sources such as petroleum, gas, gas liquids, coal, and nuclear. The EIA also discusses a number of renewable sources such as hydropower (e.g., dams, tidal, or wave action), biomass (e.g., waste or wood), biofuels (e.g., ethanol or biodiesel), wind, geothermal, and solar. Hydrogen is another fuel source discussed on these pages.
Besides renewability, a you might be interested in a source’s carbon footprint. Note that some of the sources the EIA lists as renewables may be carbon creating (such as biomass or biofuels), and some non-renewables may be carbon neutral (such as nuclear). Any type of energy source clearly has environmental implications, and the Union of Concerned Scientists has a discussion of the Environmental Impacts of Renewable Energy Technologies.
For more on renewables, check out the NREL (National Renewable Energy Laboratory), which disseminates GIS data relating to renewable energy in the US (e.g., wind speeds, wave energy, solar potential), along with some international data. The DoE’s Open Data Catalog is also particularly strong on datasets (tabular and GIS) relating to renewables. The data ranges from very specific studies to US nationwide data.
For visualizing energy-related map layers from selected non-US countries, the Renewable Energy Data Explorer (REexplorer) provides an online mapping tool. Most layers can be downloaded as GIS files. The International Renewable Energy Agency (IRENA) also has statistics on renewables. Besides downloadable data, summary visualizations can be viewed online using Tableau Dashboards.
Price and production data
The US DOE “Energy Economy” web pages will introduce you to all things relating to the economics of energy, and their EIA (mentioned above) is the main US source for fossil fuel pricing, from both the production and the retail standpoint.
Internationally, the OECD’s International Energy Agency (IEA) collects supply, demand, trade, production and consumption data, including price and tax data, relating to oil, gas, and coal, as well as renewables. In the OECD iLibrary go to Statistics tab to find many detailed IEA databases as well as PDF book series such as World Energy Balances, World Energy Outlook, and World Energy Statistics. For more international data (particularly in the developing world), you might want to try Energydata.info. This includes geospatial data and a lot on renewables, especially solar potential.
Finally, a good place to locate tabular data of all sorts is the database ProQuest Statistical Insight. It indexes publications from government agencies at all levels, IGOs and NGOs, and trade associations, usually providing the data tables or links to the data.
Infrastructure (Generation, Transportation/Distribution, and Storage)
Energy storage can include the obvious battery technologies, but also pumped hydroelectric systems and even more novel schemes. The US DoE has a Global Energy Storage Database with information on “grid-connected energy storage projects and relevant state and federal policies.”
For data or information relating to individual companies in the energy sector, as well as for more qualitative assessments of industry segments, you can begin with the library’s Company and Industry Research Guide. This leads to some of the key business sources that the Duke Libraries provide access to.
Trade associations that promote the interests of companies in particular industries can provide effective leads to data, particularly when you’re having trouble locating it from government agencies and IGOs/NGOs. If they don’t provide data or much other information on their websites, be sure to contact them to see what they might be willing to share with academic researchers. Most of the associations below focus on the United States, but some are global in scope.
The Data and Visualization Services (DVS) Department can help you locate and extract many types of data, including data about companies and industries. These may include data on firm location, aggregated data on the general business climate and conditions, or specific company financials. In addition to some freely available resources, Duke subscribes to a host of databases providing business data.
Directories of Business Locations
You may need to identify local outlets and single-location companies that sell a particular product or provide a particular service. You may also need information on small businesses (e.g., sole proprietorships) and private companies, not just publicly traded corporations or contact information for a company’s headquarters. A couple of good sources for such local data are the ReferenceUSA Businesses Database and SimplyAnalytics.
From these databases, you can extract lists of locations with geographic coordinates for plotting in GIS software, and SimplyAnalytics also lets you download data already formatted as GIS layers. Researchers often use this data when needing to associate business locations with the demographics and socio-economic characteristics of neighborhoods (e.g., is there a lack of full-service grocery stores in poor neighborhoods?).
Government surveys ask questions of businesses or samples of businesses. The data is aggregated by industry, location, size of company, and other criteria and typically include information on the characteristics of each industry, such as employment, wages, and productivity.
Macroeconomic indicators relate to the overall business climate, and a good source for macro data is Global Financial Data. Its data series includes many stock exchange and bond indexes from around the world.
Private firms also collect market research data through sample surveys. These are often from a consumer perspective, for instance to help gauge demand for specific products and services. Be aware that the numbers for small geographies (e.g., Census Tracts or Block Groups) are typically imputed from small nationwide samples, based on correlations with demographic and socioeconomic indicators. Examples of resources with such data are SimplyAnalytics (with data from EASI and Simmons) and Statista (mostly national-level data).
You may be interested in comparing numbers between companies, ranking them based on certain indicators, or gathering time-series data on a company to follow changes over time. Always be aware of whether the company is a publicly traded corporation or is privately held, as the data sources and availability of information may vary.
In the past, West Campus users would need to travel to the Ford Library at the Fuqua School of Business. This new arrangement allows them to access the Bloomberg service whenever Perkins Library is open. The service is available only to Duke students, faculty, and staff.
Data and News
Bloomberg Professional is an online service providing current and historical financial data on individual equities, stock market indices, fixed-income securities, currencies, commodities, futures, and foreign exchange for both international and domestic markets.
It also provides news on worldwide financial markets and industries as well as economic data for the countries of the world. Additionally, it provides company profiles, company financial statements and filings, analysts’ forecasts, and audio and video interviews and presentations by key players in business and finance (the Bloomberg Forum).
The Bloomberg Excel Add-in is a tool that delivers Bloomberg data directly into an Excel spreadsheet for custom analysis and calculations.
The dual monitors at each workstation provide plenty of real estate, enabling multiple windows for your research.
The Bloomberg keyboard is customized and color-coded to allow users to access quickly and easily the information contained in the Bloomberg system and to perform specific functions.
The red keys are used to login or logout of the system.
The yellow keys represent market sectors.
Green keys are action keys, to request the system to do something.
Often when using Bloomberg, your command might look something like this:
[TICKER] < MARKET > [FUNCTION CODE] < GO >
The system also allows standard mouse-clicking on the screens to activate many functions.
You may wish to become Bloomberg Certified, which requires the successful completion of several online Bloomberg Essential courses: 4 core courses plus 1 market sector found under the BESS command. Complete these at your own pace, but you only have two chances to pass the test. Certification will provide documentation that you’ve gained comprehensive knowledge of the Bloomberg Professional service.
Bloomberg for Education doesn’t have the full functionality of the commercial version of Bloomberg Professional. For instance, there is a lag in stock quotes and data that makes it incompatible for real-time analysis or trading, it has more limited downloading capabilities, and of course there’s no online trading.
You need to create your own personal login when you first access the system and will need to be near a cell phone to complete registration. You will get either a phone call or a text message with a validation code.
Once your personal login is validated and you open the Bloomberg Service, you can open Excel and then install the Excel Add-in (move mouse to lower edge of screen to activate Windows Start button, choose All Programs … Bloomberg … Install Excel Add-in). Then close and reopen Excel to display the Bloomberg tab for added functionality.
For help, please contact staff in the Library’s Data & GIS Services Dept. To tide us over while we gather further documentation, besides the green Help key on the Bloomberg keyboard, the EASY command, and the CHEAT command, please take a look at some of the following help guides that have been compiled at other libraries. (Be aware that some of the instructions regarding access and logging in are specific to these other institutions.)
A team of Duke undergraduates participating in the Global Health Capstone course was awarded the “Outstanding Capstone Research Project” for their examination of state and congressional district characteristics that might influence the outcome of legislative efforts to raise cigarette excise taxes in North Carolina, South Carolina, and Mississippi. Sarah Chapin and Gregory Morrison used GIS mapping tools in the Library’s Data & GIS Services Department to illuminate the relationships between county demographics and state legislators’ votes for or against cigarette tax hikes. Brian Clement, Alexa Monroy, and Katherine Roemer were other members of the research group. Congratulations!
The recent cigarette excise tax increases Mississippi (2009), North Carolina (2009), and South Carolina (2010) served as case studies from which to draw components of successful strategies to develop a regional legislative toolkit for those wishing to increase cigarette excise taxes in the Southeast. In all of these states, the tax increase was controversial. The Southeast in general is tax averse, which presents a systemic challenge to those who advocate raising taxes on cigarettes.
The researchers examined state characteristics which might influence the outcome of efforts to raise excise taxes, such as coalitions for and against proposed increases, the facts each side brought to bear and the nature of the discourse mobilized by different groups, the economic impact in each state of both smoking and the proposed excise taxes, and local political realities. The students restricted the area of interest to the Southeast because this region has a shared history and, consequently, similar challenges when it comes to race, poverty, and rural populations. They are also, broadly speaking, politically similar and have had a similar experience with both tobacco use and government regulation.
This multi-disciplinary analysis provides a reference point for state legislators or interest groups wishing to pass cigarette tax increases. The deliverable provided a model of past voting trends, suggestions for framing political dimensions of the issue, and strategies to overcome opposition in state legislatures.
Comparing Legislative Districts and County Data The bulk of the research involved mapping the political landscape surrounding cigarette tax legislation. In doing so, researchers looked at voting records, interest group politics, campaigns, and state ideology. Broadly, the research entailed charting the electoral geography by overlaying state house and senate districts with county-level data. Districts were coded based on voting history, party affiliation, smoking rates, and constituent demographics. State legislature websites were used to find representatives’ voting histories, allowing the researchers to match legislators by county when constructing a GIS dataset. County party affiliations are available through the state board of elections. Finally, county demographics came from the 2010 Census data.
Besides using GIS mapping to illustrate these relationships, the researchers analyzed the involvement of major interest groups, specifically, lobbying expenditures and campaign contributions to map the involvement of both pro- and anti-tobacco interest groups. Additionally, they examined the impact of state ideology on the framing of political dimensions, looking at editorials, opinion pieces, newspapers, and committee markups, as well as interviews (both previous interviews and ones they conducted) with state legislators and interest groups. Overcoming state ideology, both political and social, is a major factor in passing cigarette excise tax legislation, especially in a region with such dominant tobacco influence.
Again, the purpose of the research is not merely to understand the political landscapes surrounding the passage of cigarette tax bills, but to apply these findings to the creation of a legislative toolbox for representatives or interests groups concerned with pushing similar legislation.
Online mapping and data access has become even easier with the launch of SimplyMap 2.0. A long time favorite of Economics and Public Policy courses (and faculty) at Duke, this program provides a straight forward interface for web-based mapping and data extraction application that lets users create thematic maps and reports using US census, business, and marketing data.
Version 2.0 includes improvements designed to make it easier to find and analyze data and create professional looking GIS-style thematic maps.
Significant changes include:
A new multi-tab interface to allow you to easily switch between your projects
Interactive wizards to guide you through making maps and reports
Can choose to automatically select the geographic unit displayed on a map based on the zoom level
Easier searching and browsing to choose data variables
Assign keyword tags to organize your maps and reports
Share your work with other users of SimplyMap (send a URL that lets them open a copy of your map or report)
Data filters (greater than, less than, etc.) can now be applied to both maps and reports
More export options:Data: Excel, DBF, CSV; Maps: GIF, PDF, Shapefiles (boundaries only, no attributes)
Give SimplyMap 2.0 a try and let us know what you think. Support is always available in Perkins Data and GIS.
Do residential restrictions placed on convicted sex offenders serve to protect the public? Duke Economics Ph.D. candidate Songman Kang, has been using the analytical capabilities of geographic information software to help determine the extent to which the restrictions affect residential locations of sex offenders: computing the area covered by a restriction and determining which offenders had to relocate due to a restriction.
According to Kang, the residential restrictions are designed to reduce recidivism among sex offenders and prevent their presence near places where children regularly congregate. Neither of these claims has been found consistent with empirical evidence though, and it is unclear whether the restrictions have been successful in reducing the rates of repeat sex offenses. On the other hand, the restrictions severely limit residential location choices, and may force offenders to relocate away from employment opportunities and supportive networks of family and friends. As a result of the deteriorated economic conditions, the offenders who had to relocate may become more likely to commit non-sex offenses.
The following maps illustrate some of the restricted zones in Miami and in the Triangle area of North Carolina studied by Mr. Kang.
As water quality and questions of water supply have grown more salient in the Triangle, Duke researchers have tried to contribute to the growing debate over water quality using the latest digital mapping (GIS) tools. In the fall of 2009, Data and GIS Services in Perkins Library provided GIS analysis support for a stream and watershed assessment project that developed strategies to reverse the impact of poor urban stormwater management, degraded water quality, and the loss of natural habitats on the Duke campus.
Data/GIS helped the researchers access critical spatial data for the characterization of the contributing watershed’s current land use patterns. This data enabled the students to analyze the watershed’s area of impervious surface and hydrologic flow paths, and helped inform the understanding of the water quality issues faced at the stream site.
The GIS map below illustrates how digital mapping tools can be used to summarize a large amount of complex data into a compelling presentation.
Special thanks to the interdisciplinary team of environmental and civil engineers, biology and environmental science majors, and a Nicholas MEM student who shared their project results: Alicia Burtner, Matt Ball, Nari Sohn, Avni Patel, Will Bierbower, Adam Nathan, Mike Schallmo, Justine Jackson-Ricketts, and Jai Singh.