Category Archives: demographics

Where can I find data (or statistics) on ___________?

Helping Duke students, staff and faculty to locate data is something that we in Data and Visualization Services often do.  In this blog post I will walk you through a sample search and share some tips that I use when I search for data and statistics.

“Hi there, I am looking for motorcycle registration numbers and sales volumes by age and sex for the United States.”

BREAKING DOWN THE QUESTION:

There are two types of data needed: motorcycle registration data and motorcycle sales data. There are two criteria that the data should be differentiated by: owner’s age and owner’s gender.
There is a geographic component: United States.

One criteria that is not given is time.  When a time frame isn’t provided, I assume that what is needed is the most current data available.  Something to consider is that “current” often will still be a year or more old. It takes time for data to be gathered, cleaned and published.

***Pro-tip: When you are looking for data consider who/what/when and where – adding in those components makes it easier to construct your search.***

WHERE AND HOW DO I SEARCH?

If I do not immediately have a source in mind (and sometimes even if I do, just to hit all the bases) I will use Google and structure my search as follows: motorcycle sales and registration by age and gender united states.

***Pro-tip: You can use Google (or search engine of your choice) to search across things we subscribe to and the open Web, but you will need to be connected via a Duke IP address***

EVALUATING RESULTS

One of the first results returned is from a database we subscribe to called Statistia. This source gives me the number of motorcycle owners by age in 2018, which answers part of the question, but does not include sales information or gender breakdown.

Another top result is a report on Motorcycle Trends in the United States from the Bureau of Transportation Statistics (BTS). Unfortunately, the report is from 2009 and the data cited in the article are from 2003-2007.  A search of the BTS site does not yield any thing more current. However, when I check the source list at the bottom of the report, there are several listed that I will check directly once I’ve finished looking through my search results.

***Pro-tip: Always look for sources of data in reports and figures, even if the data are old. Heading to the source can often yield more current information.***

A third result that looks promising is from a motorcycling magazine: Motorcycle Statistics in America: Demographics Change for 2018. The article reports on statistics from the 2018 owner surveys conducted by the Motorcycle Industry Council (which is one of the sources that the Bureau of Transportation report  listed). This article provides the percent of males and females that own motorcycles as well as the median age of motorcycle owners.  While this is pretty close to the data needed, it is worthwhile to look into the Motorcycle Industry Council. Experience has taught me, however, that industry data typically is neither open nor freely available.

CHECKING THE COMMON SOURCE

When I go to the Motorcycle Industry Council (MIC) Web site I find that they do, indeed, have a statistical report that comes out every year which gives a comprehensive overview of the motorcycle industry.  If you are not a member, you can buy a copy of the report, but it is expensive (nearly $500).

***Pro-tip: Always check the original source even if you anticipate that there may be a paywall – it’s a good idea to evaluate all sources to ensure that they are credible and authoritative.***

MAKING A DECISION

In this instance, I would ultimately advise the person to use the statistics reported in the article Motorcycle Statistics in America: Demographics Change for 2018. Secondary sources aren’t ideal, and can sometimes be complicated to cite, but when you can’t get access to the primary source and that primary source is the authority, it is your best bet.

***Pro-tip: If you are using a secondary source, you should name the original source in text. For example: Data from the 2018 Motorcycle Industry Council Owner Survey (as cited by Ultimate Motorcycling, 2019) but include a citation to the secondary source in your reference list according to the formatting of the style you are using. 

PARTING THOUGHTS

In closing, the data you want might not always be the data you use – either due to the data being proprietary, restricted, or perhaps just doesn’t exist or doesn’t exist in the form you need and/or are able to use.  When this happens, take a moment to think on your research question and determine if you have the time and the resources needed to continue pursuing your question as it stands (purchasing, requesting, applying for, or collecting your own data), or if you need to broaden or change your focus to incorporate the resources you do find in a meaningful way.

Welcome to the Current Population Statistics on the Web

Duke University recently acquired access to the online version of its Current Population Statistics (CPS) CD-ROM collection to facilitate easy access to CPS data (Unicon’s CPS Utilities on the Web).  This blog post will walk through the basic data extraction process.  The interface is comparable to that provided by the CD, and users of this collection will find the interface and powerful.  Please note that the instructions provided on the web site are very important to read, particularly for those unfamiliar with the CPS CD version.

Create an Account

When you visit the Unicon site (http://unicon.com/), click the “CPS on Web” link to the left, then click the Register button.  You will have to enter some information to complete the registration process.

Once complete, submit the information.  Once the registration window closes, choose the CPS series (or month) you wish to query, and log in to the system.

 

1Navigation and Data Extraction

Once logged in, you will see a popup window like that shown in the image to the right.  For a typical data extraction, the following steps are advised.

1) First, click the Set Option button and chang4e the timeout to at least 300 seconds.  This will ensure successful data extraction.

2) Next, click the Make an Extraction button, followed by the Request Editor button on the next page.  You should see a page similar to that below (all variables used in your prior extraction will be listed).

23) Remove any variables you do not need.  Next, make certain the variable you wish to include is selected at the top and click “Add Variable(s).”  Alternatively, if you already know the names of the variables, you may type them into the boxes provided on the page.

4) Once all variables are added to the selection, click Continue.  On the following page, specify the output format for the dataset.  Once complete, be certain to select one or more years (at the top).  After you have selected years, click the Extract button.

5) On the following page, you will be presented with a list of variables by year.  As variables change across years in some cases, not all selected variables may be present for each year.  When selecting variables, checking the “View Documentation” checkbox at the top will allow for browsing of available years.

 

Other Useful Tools

– The Make a Table button allows for the construction of crosstabs of observations, means, and other statistics.  This is helpful if the goal is to locate variables for analysis or if there is a choice between two or more variables.

– The Make a Graph button is also useful for data exploration.  The program provides the ability to construct hsitograms, line charts, scatter ploys, pie charts, and bar charts.  Basic summaries of a variable can also be generated from this page.

– If your data need to be weighted to represent the US population, be certain to select the appropriate weight under the Apply Weights button before extraction.

– Subsets of individuals can also be produced under the Specify Universe button.  For example, a specific race or gender can be specified to reduce the sample to what you need.

ACS Mapping Extension for ArcGIS

The Census Bureau’s American Community Survey provides a continuous measure of the community demographics in the US.   A  new extension provided by the Department of Geography and Geoinformation Science at Geroge Mason University enhances the mapping of ACS by data by allowing researchers to visualize both survey estimates while revealing the level of uncertainty in the estimates.  ACS Mapping Extensions is an ArcGIS addon available for both ArcGIS 9.3 and 10.  This post provides a brief overview of installation, setup, and use.  Detailed technical assistance is provided by the extension.

 

 

Installation
1) Once you download the program, you will want to install and note the installation directory.  In ArcGIS, select Customize from the menu bar, and click Customize Mode….  Then select “Add from file…” and navigate to the installation directory.  Once in this directory, select the “ACSMapping.tlb” file.

 

2) Before you leave the Customize window, be sure to check the “ACS Mapping Tools” toolbar.  You will have a new “ACS Mapping” toolbar added to your window.

 

 

Setup
1) The “Documentation” option in the “ACS Mapping” toolbar provides detailed instructions for downloading ACS data and boundary files.  Follow these instructions to the letter and to their entirety.  With respect to boundary files, the TIGER 2008 county boundaries were used for this example.

 

2) Add the boundary layer to a blank map and select “Join ACS Table(s) with Shapefiles” option in the “ACS Mapping” toolbar.  In this example, I have downloaded county boundaries and county-level median income data from the 2005-09 ACS.  In this figure, the first two fields indicate the items to be joined, one table to one shapefile.   “CNTYIDFP” represents the FIPS code in the boundary file, and “GEO_ID2” is the corresponding code in the ACS table.  Once you’ve set an output location, select “OK.”

 

3) Finally, you will want to apply a symbology to the layer.  In this case, I chose the median income estimate and 5 total categories.  The following figure shows what my map looks like at this point.

 

 

Mapping ACS Estimates with Coefficients of Variation

1) The tools are located under the “Mapping Data Uncertainty” option in the ACS Mapping toolbar.  The first option, “Overlay CVs with Estimates,” will allow you to visualize the uncertainty of estimates at the same time as the estimates themselves.    As noted by the documetation provided by the ACS Mapping Extension web site, ACS provides a margin of error that produces a confidence level of 90%.  This tool will convert these data into coefficients of variation that will allow you to assess the quality of the estimates.

 

2) Select the target layer to whcih you added symbology, select the variable that stores the estimate to be calculated, and finally, select the variable that stores the margin of error (suffix = “_M”).

 

3) After you click the “Select” button, you will be presented with the new Symbology options for the new coefficients of variation layer to be generated.  In this case, I retained the automatic selections and hit “OK.”

 

4) Zooming in to central North Carolina, one can see not only that the Research Triangle Area has relatively high incomes compared with much of North Carolina, but that coefficients of variation are lower than thay are for parts of northern North Carolina and southern Virginia.

 

 

Measuring Singificant Differences in Income
1) The second option, “Identify Areas of Significant Differences,” allows you to assess whether there is a significant difference between one spatial unit and all other spatial units for a given variable.  In order for this option to work, you must select one specific spatial unit.  In this example, I selected Durham County and will assess whether there are significant differences in median household income in the region.

 

2) First, select the target layer for which you selected a single feature.  You want to verify the estimates and margin of error variables, and you can adjust the confidence level from the default 90%.  Select OK.

 

3) The output is represented by four different symbologies.  First, your chosen county is filled with dots.  All counties that are significantly different are striped, while all those that are not are empty.  Finally, when significance cannot be determined, the original color fill is replaced with a new color.  In this case, median household income is not significantly different between Durham and Chatham counties.  However, this could be due to small differences or large margins of error in one or both counties.