Category Archives: Data Sources

Minding Your Business: Locating Company and Industry Data

The Data and Visualization Services (DVS) Department can help you locate and extract many types of data, including data about companies and industries.  These may include data on firm location, aggregated data on the general business climate and conditions, or specific company financials.  In addition to some freely available resources, Duke subscribes to a host of databases providing business data.

Directories of Business Locations

You may need to identify local outlets and single-location companies that sell a particular product or provide a particular service.  You may also need information on small businesses (e.g., sole proprietorships) and private companies, not just publicly traded corporations or contact information for a company’s headquarters.  A couple of good sources for such local data are the ReferenceUSA Businesses Database and SimplyAnalytics.

From these databases, you can extract lists of locations with geographic coordinates for plotting in GIS software, and SimplyAnalytics also lets you download data already formatted as GIS layers. Researchers often use this data when needing to associate business locations with the demographics and socio-economic characteristics of neighborhoods (e.g., is there a lack of full-service grocery stores in poor neighborhoods?).

SimplyAnalytics
SimplyAnalytics

When searching these resources (or any business data source), it often helps to use an industry classification code to focus your search. Examples are the North American Industry Classification System (NAICS) and the Standard Industrial Classification (SIC) (no longer revised, but still commonly used). You can determine a code using a keyword search or drilling down through a hierarchy.

Aggregated Business and Marketing Data

Government surveys ask questions of businesses or samples of businesses. The data is aggregated by industry, location, size of company, and other criteria and typically include information on the characteristics of each industry, such as employment, wages, and productivity.

Sample Government Resources

Macroeconomic indicators relate to the overall business climate, and a good source for macro data is Global Financial Data. Its data series includes many stock exchange and bond indexes from around the world.

Private firms also collect market research data through sample surveys. These are often from a consumer perspective, for instance to help gauge demand for specific products and services. Be aware that the numbers for small geographies (e.g., Census Tracts or Block Groups) are typically imputed from small nationwide samples, based on correlations with demographic and socioeconomic indicators. Examples of resources with such data are SimplyAnalytics (with data from EASI and Simmons) and Statista (mostly national-level data).

Firm-Level Data

You may be interested in comparing numbers between companies, ranking them based on certain indicators, or gathering time-series data on a company to follow changes over time.  Always be aware of whether the company is a publicly traded corporation or is privately held, as the data sources and availability of information may vary.

For firm-level financial detail, public corporations traded in the US are required to submit data to the U.S. Securities and Exchange Commission (SEC).

EDGAR
SEC’s EDGAR Service

Their EDGAR service is the source of the corporate financials repackaged by commercial data providers, and you might find additional context and narrative analysis with products such as Mergent Online, Thomson One, or S&P Global NetAdvantage.  The Bloomberg Professional Service in the DVS computer lab contains a vast amount of data, news, and analysis on firms and economic conditions worldwide. You can find many more sources for firm- and industry-specific data from the library’s guide on Company and Industry Research, and of course at the Ford Library at the Fuqua School of Business.

All of these sources provide tabular download options.

For help finding any sort of business or industry data, don’t hesitate to contact us at askdata@duke.edu.

Where can I find data (or statistics) on ___________?

Helping Duke students, staff and faculty to locate data is something that we in Data and Visualization Services often do.  In this blog post I will walk you through a sample search and share some tips that I use when I search for data and statistics.

“Hi there, I am looking for motorcycle registration numbers and sales volumes by age and sex for the United States.”

BREAKING DOWN THE QUESTION:

There are two types of data needed: motorcycle registration data and motorcycle sales data. There are two criteria that the data should be differentiated by: owner’s age and owner’s gender.
There is a geographic component: United States.

One criteria that is not given is time.  When a time frame isn’t provided, I assume that what is needed is the most current data available.  Something to consider is that “current” often will still be a year or more old. It takes time for data to be gathered, cleaned and published.

***Pro-tip: When you are looking for data consider who/what/when and where – adding in those components makes it easier to construct your search.***

WHERE AND HOW DO I SEARCH?

If I do not immediately have a source in mind (and sometimes even if I do, just to hit all the bases) I will use Google and structure my search as follows: motorcycle sales and registration by age and gender united states.

***Pro-tip: You can use Google (or search engine of your choice) to search across things we subscribe to and the open Web, but you will need to be connected via a Duke IP address***

EVALUATING RESULTS

One of the first results returned is from a database we subscribe to called Statistia. This source gives me the number of motorcycle owners by age in 2018, which answers part of the question, but does not include sales information or gender breakdown.

Another top result is a report on Motorcycle Trends in the United States from the Bureau of Transportation Statistics (BTS). Unfortunately, the report is from 2009 and the data cited in the article are from 2003-2007.  A search of the BTS site does not yield any thing more current. However, when I check the source list at the bottom of the report, there are several listed that I will check directly once I’ve finished looking through my search results.

***Pro-tip: Always look for sources of data in reports and figures, even if the data are old. Heading to the source can often yield more current information.***

A third result that looks promising is from a motorcycling magazine: Motorcycle Statistics in America: Demographics Change for 2018. The article reports on statistics from the 2018 owner surveys conducted by the Motorcycle Industry Council (which is one of the sources that the Bureau of Transportation report  listed). This article provides the percent of males and females that own motorcycles as well as the median age of motorcycle owners.  While this is pretty close to the data needed, it is worthwhile to look into the Motorcycle Industry Council. Experience has taught me, however, that industry data typically is neither open nor freely available.

CHECKING THE COMMON SOURCE

When I go to the Motorcycle Industry Council (MIC) Web site I find that they do, indeed, have a statistical report that comes out every year which gives a comprehensive overview of the motorcycle industry.  If you are not a member, you can buy a copy of the report, but it is expensive (nearly $500).

***Pro-tip: Always check the original source even if you anticipate that there may be a paywall – it’s a good idea to evaluate all sources to ensure that they are credible and authoritative.***

MAKING A DECISION

In this instance, I would ultimately advise the person to use the statistics reported in the article Motorcycle Statistics in America: Demographics Change for 2018. Secondary sources aren’t ideal, and can sometimes be complicated to cite, but when you can’t get access to the primary source and that primary source is the authority, it is your best bet.

***Pro-tip: If you are using a secondary source, you should name the original source in text. For example: Data from the 2018 Motorcycle Industry Council Owner Survey (as cited by Ultimate Motorcycling, 2019) but include a citation to the secondary source in your reference list according to the formatting of the style you are using. 

PARTING THOUGHTS

In closing, the data you want might not always be the data you use – either due to the data being proprietary, restricted, or perhaps just doesn’t exist or doesn’t exist in the form you need and/or are able to use.  When this happens, take a moment to think on your research question and determine if you have the time and the resources needed to continue pursuing your question as it stands (purchasing, requesting, applying for, or collecting your own data), or if you need to broaden or change your focus to incorporate the resources you do find in a meaningful way.

Fall 2016 DVS Workshop Series

GenericWorkshops-01Data and Visualization Services is happy to announce its Fall 2016 Workshop Series. Learn new ways of enhancing your research with a wide range of data driven research methods, data tools, and data sources.

Can’t attend a session?  We record and share most of our workshops online.  We are also happy to consult on any of the topics above in person.  We look forward to seeing you in the workshops, in the library, or online!

Data Sources
 
Data Cleaning and Analysis
 
Data Analysis
Introduction to Stata (Two sessions: Sep 21, Oct 18)
 
Mapping and GIS
Introduction to ArcGIS (Two sessions: Sep 14, Oct 13)
ArcGIS Online (Oct 17)
 
Data Visualization

Visualizing Qualitative Data (Oct 19)
Visualizing Basic Survey Data in Tableau – Likert Scales (Nov 10)

DVS Fall Workshops

GenericWorkshops-01Data and Visualization Services is happy to announce its Fall 2015 Workshop Series.  With a range of workshops covering basic data skills to data visualization, we have a wide range of courses for different interests and skill levels..  New (and redesigned) workshops include:

  • OpenRefine: Data Mining and Transformations, Text Normalization
  • Historical GIS
  • Advanced Excel for Data Projects
  • Analysis with R
  • Webscraping and Gathering Data from Websites

Workshop descriptions and registration information are available at:

library.duke.edu/data/news

 

Workshop
 

Date

OpenRefine: Data Mining and Transformations, Text Normalization
Sep 9
Basic Data Cleaning and Analysis for Data Tables
Sep 15
Introduction to ArcGIS
Sep 16
Easy Interactive Charts and Maps with Tableau
Sep 18
Introduction to Stata
Sep 22
Historical GIS
Sep 23
Advanced Excel for Data Projects
Sep 28
Easy Interactive Charts and Maps with Tableau
Sep 29
Analysis with R
Sep 30
ArcGIS Online
Oct 1
Web Scraping and Gathering Data from Websites
Oct 2
Advanced Excel for Data Projects
Oct 6
Basic Data Cleaning and Analysis for Data Tables
Oct 7
Introduction to Stata
Oct 14
Introduction to ArcGIS
Oct 15
OpenRefine: Data Mining and Transformations, Text Normalization
Oct 20
Analysis with R
Oct 20

 

ArcGIS Open Data

What is Open Data?

Finding data can be challenging.  Organizations and government agencies can share their data with the public using ESRI’s ArcGIS Open Data, a centralized spatial data clearinghouse.  Since its inception last year, over 1,600 organizations have provided more than 22,000 open datasets to the public.  Open Data allows users to find and download data in different formats, including shapefiles, spreadsheets, and KML documents, as well as APIs (GeoJSON or Esri GeoServices) to call the data into your own application.  It also lets you create various types of charts.

Search_Open_Data

How to Find and Use Data

Open Data allows consumers to type in a geographic area or a topic of interest in a single search box.  Once you’ve found data that appears to be what you were looking for, you can use the data for GIS purposes or use a table to create charts and graphs.  If you are looking for GIS data, you can preview the spatial data before downloading by clicking the “Open in ArcGIS” icon.  This takes users to ArcGIS Online where they can create choropleth maps and interact with the attribute table.   Users interested in tabular data can filter it and create various types of charts.  If more analysis of the data is necessary, you can download it by clicking the “Download Dataset” icon; you are able to download the entire dataset or the filtered dataset you’ve been working with.

OpenData_Page

Tips

The Source and Metadata links below the “About” heading provide information about the data.  In-depth information such as descriptions, attributes, OpenDataAboutand how the data was collected are provided in these links.  Below the name of the dataset there are three tabs:  “Details,” “Table,” and “Charts.”  Under the “Details” tab there are three sections, the Description, Dataset Attributes, and Related Datasets sections.  The Dataset Attributes section outlines the fields found within the dataset and provides field type information, while the Related Datasets section provides links to other datasets that have similar geographies or topics to the dataset you’ve chosen.  In the “Table” tab, you can view and filter the entire table in the dataset and the “Charts” tab allows you to create different charts.

OpenDataDetailTo obtain the most updated dataset or other updated articles related to the dataset, users should subscribe to the dataset they are interested in.  To subscribe, copy the link provided into an RSS Reader.  For specific data source questions, feel free to ask the Data and Visualization Department at askdata@duke.edu.

DataFest 2015 @ the Edge

DataFest 2015Duke Libraries are happy to host the American Statistical Association’s Data Fest Competition the weekend of March 20-22nd.  In its fourth year at Duke, DataFest brings teams of students from across the Research Triangle to compete in a weekend long competition that stresses data cleaning, analytics, and visualization skills.   The Edge provides a central location for the competition with facilities designed for collaborative, data driven research.

While the deadline for forming DataFest teams has past, Data and Visualization Services and Duke’s Department of Statistical Sciences are happy to offer another opportunity to participate in DataFest.  Starting Monday, March 16th we are offering four workshops on data analytics and visualization in the four days leading up to the DataFest event.  All workshops are open to the public, but we strongly encourage early registration to ensure a seat. Please come join us as we get ready to celebrate ASA DataFest 2015.

DataFest Workshop Series

Monday, March 16th, 6:00-8:00 PM – Introduction to R

Tuesday, March 17th, 1:30-3:00 PM – Easy Interactive Charts and Maps with Tableau

Wednesday, March 18th,  6:00-8:00 PM – Data Munging with R and dplyr

Thursday, March 19th, 7:00-9:00 PM – Visualization in d3

 

New Year- New Data and Visualization Lab!

Data and Visualization Services is happy to announce our new Data and Visualization Lab in Duke Libraries new Edge research space.  Located on the first floor of the Bostock Library, the Brandaleone Family Lab for Data and Visualization Services offers a dedicated space for researchers working on data driven projects.

The lab features three distinct areas for supporting data driven research.

Data and Visualization Lab Space

Data and Visualization Lab Computing Zone

Our lab space features twelve high end workstations with dual monitors with the latest software for data visualization, digital mapping, statistics, and qualitative research.  All of the machines have two dedicated displays to encourage collaborative work and data consultations.  Additionally, all twelve machines have a dedicated power port located conveniently under the edge of the table for powering a laptop or usb powered device.

Bloomberg Professional “Bar”

bloom

Since the launch of our Bloomberg terminals, we have seen a steady increase in both individual and team based usage of Bloomberg financial data.  Our three Bloomberg Professional workstations are now located on a dedicated “bar” across from our lab machines.  The  new Bloomberg zone will facilitate collaborate work and provide a base for groups such as the Duke University Investment Club and Duke Financial Economics Center.

Consult and Collaborative SpaceCollaboration Zone

Our third lab space provides a set of four rolling tables for small groups to collaborate or for projects that don’t require a fixed computing space.   An 85″ flat panel display near this zone features data visualizations and other data driven research projects at Duke.

Come See Us!

With ample natural light,  almost 24/7 availability, and a welcoming staff eager to work with you on your next data driven project.  We look forward to working with you in the upcoming year!

Welcome to the Current Population Statistics on the Web

Duke University recently acquired access to the online version of its Current Population Statistics (CPS) CD-ROM collection to facilitate easy access to CPS data (Unicon’s CPS Utilities on the Web).  This blog post will walk through the basic data extraction process.  The interface is comparable to that provided by the CD, and users of this collection will find the interface and powerful.  Please note that the instructions provided on the web site are very important to read, particularly for those unfamiliar with the CPS CD version.

Create an Account

When you visit the Unicon site (http://unicon.com/), click the “CPS on Web” link to the left, then click the Register button.  You will have to enter some information to complete the registration process.

Once complete, submit the information.  Once the registration window closes, choose the CPS series (or month) you wish to query, and log in to the system.

 

1Navigation and Data Extraction

Once logged in, you will see a popup window like that shown in the image to the right.  For a typical data extraction, the following steps are advised.

1) First, click the Set Option button and chang4e the timeout to at least 300 seconds.  This will ensure successful data extraction.

2) Next, click the Make an Extraction button, followed by the Request Editor button on the next page.  You should see a page similar to that below (all variables used in your prior extraction will be listed).

23) Remove any variables you do not need.  Next, make certain the variable you wish to include is selected at the top and click “Add Variable(s).”  Alternatively, if you already know the names of the variables, you may type them into the boxes provided on the page.

4) Once all variables are added to the selection, click Continue.  On the following page, specify the output format for the dataset.  Once complete, be certain to select one or more years (at the top).  After you have selected years, click the Extract button.

5) On the following page, you will be presented with a list of variables by year.  As variables change across years in some cases, not all selected variables may be present for each year.  When selecting variables, checking the “View Documentation” checkbox at the top will allow for browsing of available years.

 

Other Useful Tools

– The Make a Table button allows for the construction of crosstabs of observations, means, and other statistics.  This is helpful if the goal is to locate variables for analysis or if there is a choice between two or more variables.

– The Make a Graph button is also useful for data exploration.  The program provides the ability to construct hsitograms, line charts, scatter ploys, pie charts, and bar charts.  Basic summaries of a variable can also be generated from this page.

– If your data need to be weighted to represent the US population, be certain to select the appropriate weight under the Apply Weights button before extraction.

– Subsets of individuals can also be produced under the Specify Universe button.  For example, a specific race or gender can be specified to reduce the sample to what you need.