Bloomberg Has Arrived

No, it’s not Michael Bloomberg, New York City’s mayor, but the financial data service that he founded back in 1981.

The Data & GIS Services Department of Perkins Library is pleased to announce the installation of three Bloomberg Terminals in the Data/GIS Computer Cluster (Perkins Room 226). The terminals are made possible with the generous assistance of the Duke Financial Economics Center in the Duke Department of Economics.

In the past, West Campus users would need to travel to the Ford Library at the Fuqua School of Business.  This new arrangement allows them to access the Bloomberg service whenever Perkins Library is open.  The service is available only to Duke students, faculty, and staff.

Data and NewsBloomberg Monitors

Bloomberg Professional is an online service providing current and historical financial data on individual equities, stock market indices, fixed-income securities, currencies, commodities, futures, and foreign exchange for both international and domestic markets.

It also provides news on worldwide financial markets and industries as well as economic data for the countries of the world.  Additionally, it provides company profiles, company financial statements and filings, analysts’ forecasts, and audio and video interviews and presentations by key players in business and finance (the Bloomberg Forum).

The Bloomberg Excel Add-in is a tool that delivers Bloomberg data directly into an Excel spreadsheet for custom analysis and calculations.

Bloomberg keyboard

Hardware

The dual monitors at each workstation provide plenty of real estate, enabling multiple windows for your research.

The Bloomberg keyboard is customized and color-coded to allow users to access quickly and easily the information contained in the Bloomberg system and to perform specific functions.

  • The red keys are used to login or logout of the system.
  • The yellow keys represent market sectors.
  • Green keys are action keys, to request the system to do something.

Often when using Bloomberg, your command might look something like this:
[TICKER] < MARKET > [FUNCTION CODE] < GO >

The system also allows standard mouse-clicking on the screens to activate many functions.

Bloomberg Certification

You may wish to become Bloomberg Certified, which requires the successful completion of several online Bloomberg Essential courses: 4 core courses plus 1 market sector found under the BESS command.  Complete these at your own pace, but you only have two chances to pass the test.  Certification will provide documentation that you’ve gained comprehensive knowledge of the Bloomberg Professional service.

Limitations

Bloomberg for Education doesn’t have the full functionality of the commercial version of Bloomberg Professional.  For instance, there is a lag in stock quotes and data that makes it incompatible for real-time analysis or trading, it has more limited downloading capabilities, and of course there’s no online trading.

Login

You need to create your own personal login when you first access the system and will need to be near a cell phone to complete registration.  You will get either a phone call or a text message with a validation code.

Once your personal login is validated and you open the Bloomberg Service, you can open Excel and then install the Excel Add-in (move mouse to lower edge of screen to activate Windows Start button, choose All Programs … Bloomberg … Install Excel Add-in).  Then close and reopen Excel to display the Bloomberg tab for added functionality.

Cheat Sheet to log in to Bloomberg at the Library

Assistance

For help, please contact staff in the Library’s Data & GIS Services Dept.  To tide us over while we gather further documentation, besides the green Help key on the Bloomberg keyboard, the EASY command, and the CHEAT command, please take a look at some of the following help guides that have been compiled at other libraries. (Be aware that some of the instructions regarding access and logging in are specific to these other institutions.)

Time Series Visualizations in ArcGIS – An Introduction

Introduction

ArcGIS 10 makes it easy to manage and visualize time-series data to identify trends and create compelling visualizations.  Creating a visualization of time-series data requires only a few additional steps beyond those needed to produce any map.

Step 1: Data Formatting

Time-series data contains records, each of which is specific to both an individual and to a single point in time.  The following example uses employment data for the textile industry in North Carolina from 2000 through 2009.

In this example, “fips” corresponds to each county’s unique FIPS code, “industry” corresponds to the textile industry’s unique NAICS code representation, “t” denotes the year.  Establishments, employment, and annual pay, our data items, are stored in the fields “est”, “emp”, and “pay_ann”.  All missing values were coded ‘-1’.

Tip: Make sure each record has a value.  Records without values will not be drawn in ArcGIS.

Tip: Do not name the time field “year,” as it is a reserved name in ArcGIS.

We suggest based on experience that the storage of data in a Microsoft Access database provides the greatest degree of reliability.

Step 2: Add Data to Map in ArcGIS

Once the data is formatted, join the data to a geographic layer.  For help in finding a geographic layer, please consult the Perkins Data and GIS Services Department.

Tip: When joining layers, it is good practice to Verify the join selection before approving.  The program will inform you of any errors.

Step 3: Enabling Time

Once the data are joined to a layer, enter the layer properties by right-clicking the layer name in the Table of Contents pane.

Navigate to the Time tab and check the box.  ArcGIS will want to know which field contains time information, as well as the format.  If the join was successful, you will see the fields that represent the data joined to the geographic layer.  In this example, the time field is labeled “t”.

You must also specify the date/time format.  Available time formats are listed to the right.

Finally, you will have to enable time on the data table as well.  To do this, right-click the data table in the Table of Contents pane.  Follow the same steps as presented for the geographic layer.

Step 4: Enable Time Display

Now that ArcGIS understands the data structure, you may enable time visualization.  The “Tools” toolbar, which contains the most commonly used tools, contains the button highlighted below, “Open Time Slider Window”.  Select this button.

The time slider window (left) will appear.  The slider spans the time range of the data, identifies what point in this range is currently displayed on the map, and allows for access to a variety of playback and recording options.  To access these options, click the options button.

This button is the equivalent of “Play.”  It will display the data from the first time point to the last.

Buttons with both arrows and vertical lines are one-step increments.  This particular button moves forward one time increment, the other one moves back.

This button exports the display to video.  This is the final step.

Step 5: Configure Options and Visual Display

Before you export to video, you will want to configure the appearance of the map.  This example will focus on new options that come with time series data.

First, select “Options” in the Time Slider toolbar.  Under the “Time Display” tab, you can alter the format of the displayed date to conform to your data.  In this example, I selected 2011 (yyyy) because we are using annual data.

Second, under the “Playback” tab, you can specify a length of time for playback.  This example contains 10 years of data.  If I specify 5 seconds playback, each data year will be displayed for one-half second.  If I specify 10 second, each year will be visible for 1 second.

Third, I will display the year in order to make clear to the viewer the time point that is visible.  To do this, I will go to “Insert” “Dynamic Text” “Data Frame Time.”

Tip: Alternatively, you can insert the data frame time into the title or other display object by including the following in the text of the object: <dyn type=”dataFrame” name=”Layers” property=”time” emptyStr=”[off]”/>

After some trial and error, I successfully integrated the time currently visible into the title.  The image to the left shows its appearance.

Step 6: Export to Video

Once the appearance of the map is satisfactory, you can export the map to video or to sequential images.  Click the “Export to Video” button on the time slider window.

Tip: maximize the ArcGIS window, switch to Layout View, zoom the layout to 100%, and clear any toolbars that may obstruct the layout view to improve video appearance.

First, you will be asked for a file or folder location and the export format.  Videos are exported as AVI files, while sequential images are exported to a folder either as bitmaps or JPEGS.

Second, if you exported to video, you will be asked to select a codec, which essentially encodes and compresses the outputted video.  The codec selection depends on the individual machine, and some codecs work with ArcGIS better than others.

Finally, you may have to produce a video several times before it comes out as expected.  Be sure to watch for missing time points, as this frequently happens.  Fixing the video length to a specific play duration per time point (one-half second or one second) helps you watch for these missing time points.

The following example is a 5-second video that displays employment in the textiles industry in North Carolina from 2000 through 2009.  Note that declining employment is signified by colors that change from dark to light.

Catching up on computational biology resources

With the arrival of summer, now is great time to catch up on these resources in computation biology and bioinformatics:

BioStar: Have a question on bioinformatics, computational genomics and biological data analysis but not sure who to ask? Try BioStar, which is an online open community of biologists ready to answer questions, even from “newbies”. You are also welcome to answer and comment on the questions. The more you do, the more reputation points you can earn toward your BioStar badge.

OpenHelix: The site provides a searchable collection of tutorials,  training materials, and exercises on the most popular genomic resources. The folks at OpenHelix also contract with resource providers to offer onsite, hands-on workshops at institutions. While most of their tutorials and training materials require a subscription, they do provide a suite of free tutorials, including ones on the UCSC Genome Browser and the RCSB Protein Data Bank.

Database: The Journal of Biological Databases and Data Curation: While maybe not beach reading, Database is a nice complement to the Nucleic Acids Research annual database issue. This open-access journal, launched in 2009, aims to provide a “platform for the presentation of novel ideas in database research and biocuration, and aims to help strengthen the bridge between database developers, curators, and users.”

Have a computation biology resource you would like to recommend? Please leave a comment.

Where There’s Smoke …

A team of Duke undergraduates participating in the Global Health Capstone course was awarded the “Outstanding Capstone Research Project” for their examination of state and congressional district characteristics that might influence the outcome of legislative efforts to raise cigarette excise taxes in North Carolina, South Carolina, and Mississippi.  Sarah Chapin and Gregory Morrison used GIS mapping tools in the Library’s Data & GIS Services Department to illuminate the relationships between county demographics and state legislators’ votes for or against cigarette tax hikes. Brian Clement, Alexa Monroy, and Katherine Roemer were other members of the research group.  Congratulations!

Regional Focus
The recent cigarette excise tax increases Mississippi (2009), North Carolina (2009), and South Carolina (2010) served as case studies from which to draw components of successful strategies to develop a regional legislative toolkit for those wishing to increase cigarette excise taxes in the Southeast.  In all of these states, the tax increase was controversial. The Southeast in general is tax averse, which presents a systemic challenge to those who advocate raising taxes on cigarettes.

Senate Votes & Poverty by CountyThe researchers examined state characteristics which might influence the outcome of efforts to raise excise taxes, such as coalitions for and against proposed increases, the facts each side brought to bear and the nature of the discourse mobilized by different groups, the economic impact in each state of both smoking and the proposed excise taxes, and local political realities. The students restricted the area of interest to the Southeast because this region has a shared history and, consequently, similar challenges when it comes to race, poverty, and rural populations. They are also, broadly speaking, politically similar and have had a similar experience with both tobacco use and government regulation.

This multi-disciplinary analysis provides a reference point for state legislators or interest groups wishing to pass cigarette tax increases.  The deliverable provided a model of past voting trends, suggestions for framing political dimensions of the issue, and strategies to overcome opposition in state legislatures.

Comparing Legislative Districts and County Data
Senate Votes & Party AffiliationThe bulk of the research involved mapping the political landscape surrounding cigarette tax legislation.  In doing so, researchers looked at voting records, interest group politics, campaigns, and state ideology. Broadly, the research entailed charting the electoral geography by overlaying state house and senate districts with county-level data.  Districts were coded based on voting history, party affiliation, smoking rates, and constituent demographics.  State legislature websites were used to find representatives’ voting histories, allowing the researchers to match legislators by county when constructing a GIS dataset.  County party affiliations are available through the state board of elections.  Finally, county demographics came from the 2010 Census data.

Senate Votes & Percent Black by County

Overcoming Ideology
Besides using GIS mapping to illustrate these relationships, the researchers analyzed the involvement of major interest groups, specifically, lobbying expenditures and campaign contributions to map the involvement of both pro- and anti-tobacco interest groups.  Additionally, they examined the impact of state ideology on the framing of political dimensions, looking at editorials, opinion pieces, newspapers, and committee markups, as well as interviews (both previous interviews and ones they conducted) with state legislators and interest groups.  Overcoming state ideology, both political and social, is a major factor in passing cigarette excise tax legislation, especially in a region with such dominant tobacco influence.

Again, the purpose of the research is not merely to understand the political landscapes surrounding the passage of cigarette tax bills, but to apply these findings to the creation of a legislative toolbox for representatives or interests groups concerned with pushing similar legislation.

Swimming in a Sea of Data

This post comes from Erika Kociolek, a second year Master in Environmental Management student at the Nicholas School.  The Data and GIS staff want to congratulate Erika on successfully defending her project!

For about 4 months, I’ve been swimming in a proverbial sea of data related to hypoxia (low dissolved oxygen concentrations) and landings in the Gulf of Mexico brown shrimp fishery.  I’m a second year master of environmental management (MEM) student at the Nicholas School, focusing on Environmental Economics and Policy.  I’ve been working with my advisor, Dr. Lori Bennear, to complete my master’s project (MP), an analysis attempting to estimate the effect of hypoxia  on landings and other economic outcomes of interest.

To do this, we are using data from the Southeast Monitoring and Assessment Program (SEAMAP), NOAA/NMFS, and a database of laws and policies related to brown shrimp that I compiled in Fall 2010.  By running regressions that difference out all variation in catch except for that attributable to hypoxia, we can isolate its effect on economic outcomes of interest.  I’ve found that catch, revenue, catch per unit effort, and revenue per unit effort are all larger in the presence of summer hypoxia.  However, if we look at catch for different sizes of shrimp, we see that in the presence of summer hypoxia, catch of larger shrimp decreases and catch of smaller shrimp increases significantly.

Getting to the point of discussing results has required a bunch of data analysis, cleaning, management, and visualization.  I used R, STATA, ArcGIS, and have even used video editing software to make dynamic graphics representing my results that have improved my own understanding of the raw data.  As an example, the video below, showing the change in hypoxia over time (1997-2004), was created using ArcGIS 10.

http://youtu.be/2YfYBE_Fe7U

Note: The maps in the video above use data from the Southeast Monitoring and Assessment Program (SEAMAP).

Hypoxia is a dynamic and complex phenomenon, varying in severity, over time, and in space; hypoxia in Gulf waters is more severe and widespread in summer.  The model I’m using actually takes advantage of this variation to obtain an estimate of the effect of hypoxia on catch and other economic outcomes.  To show people the source of variation I’m exploiting, I created this video.  These maps are drawing on data of dissolved oxygen concentrations and displaying it spatially.

We have dissolved oxygen measurements for most of the Gulf in the summer (June) and fall (December).  Each subarea-depth zone (see related map) that changes from salmon shading (not hypoxic) to red (hypoxic), or vice-versa, is variation in hypoxia that the models I’m running use to get an estimate of the hypothesized effect.

Many thanks are due to my advisor, Dr. Bennear, as well as to the helpful folks at the Data/GIS lab, who have provided invaluable assistance with the data management and data visualization components of this project!

This research was funded by NOAA’s National Center for Coastal Ocean Science, Award #NA09NOS4780235.

Surveying Our Researchers

Understanding library users’ research goals remains a key element of the Perkins Library’s Strategic Plan.  As part of the Library’s User Studies Initiative, Teddy Gray surveyed the Biology Department in the Fall of 2010 to discover what tools and resources departmental members use in their research, researchers’ data management needs, and the impact of the BES Library closing in 2009.

DATA AND DATA MANAGEMENT IN BIOLOGY
From the 18 interviews of faculty, graduate students, postdocs, and lab managers, we learned–not surprisingly–that nearly all the interviewees use data in their research, most of which they generate themselves. Half incorporate data from others into their work with nearly a third using sequence data from GenBank. Out of the 12 interviewees who generate data in their labs, two-thirds archive their data in existing repositories.

In addition to the interviews, this survey also examined research articles produced by Duke Biologists from 2009 in which we paid special attention to their methods sections and citation patterns. From analyzing departmental research articles, we found out the nearly 40% of the authors deposited their research data into either GenBank or a journal archive. Only one author deposited data into another existing scientific repository. Again nearly 40% of the authors used a general statistical package in their work (SAS and R being the most popular), while nearly half used a biology-specific statistical tool.

THE (RISE?) PREVALENCE of R
Almost everyone interviewed uses statistical tools in their research with over half now using R. Many also use biology-specific statistical programs.

PRINT VERSUS ELECTRONIC
All but one of the interviewees prefer the online versions of library material over the print. A third use image databases–primarily Google Images–in their teaching and presentations; however, only one interviewee knew of subject specific image databases such as the Biology Image Library. And while some interviewees missed the convenience of easy shelf browsing with the BES Library so close by, all are happy with the daily document delivery to the building.

FINAL THOUGHTS
We are grateful to the Biology Department for their support (and time) in conducting this survey and plan to use the results as the basis for library services.  Data and GIS Services is always interested in hearing more from Duke researchers about the nature of your research! Please let us know if you would like to discuss your research interest and/or library needs.

Wrangle, Refine, and Represent

Data visualization and data management represented the core themes of the 2011 Computer Assisted Reporting (CAR) Conference that met in Raleigh from February 24-27.  Bringing together journalists, computer scientists, and faculty, the conference united a number of communities that share a common interest in gathering and representing empirical evidence online (and in print).

While the conference featured luminaries in data visualization (Amanda Cox, David Huynh , Michal Migurski, Martin Wattenberg) who gave sage advice on how to best represent data online, web based data visualization tools provided a central focus for the conference.

Notable tools that may be of interest to the Duke research (and teaching) community include:

DataWrangler – An interactive data cleaning tool much like Google Refine (see below)

Google Fusion Tables – “manage large collections of tabular data in the cloud” – Fusion tables provides convenient access to google’s data visualization and mapping services.  The service also allows groups to annotate data online.

Google Refine – Refine is primarily a data cleaning tool that simplifies the process of cleaning data for further processing or analysis.  While users of existing data management tools may not be convinced to leave their current data management tool, Refine provides a rich suite of tools that will likely attract many new converts.

Many Eyes – One of the premier online visualization tools hosted by IBM.  Visualizations range from pie charts to digital maps to text analysis.  Many Eye’s versatility is one of its key strengths.

Polymaps – Billed as a “javascript library for image- and vector-tiled maps” – Polymaps allows the creating of custom lightweight map services on the web.

SIMILE Project (Semantic Interoperability of Metadata and Information in unLike Environments) – The SIMILE Project is a collection of different research projects designed to “enhance inter-operability” among digital assets.  At the conference, the Exhibit Project received particular attention for its ability to produce data rich visualization with very little coding required.

Timeflow –  Presented by Sarah Cohen and designed by Martin Wattenberg- Timeflow provides a convenient application for visualizing temporal data.

What’s hot in molecular biology databases

The journal Nucleic Acids Research has just published its 18th annual database issue. The current issue summarizes 96 new and 83 previously reviewed molecular biology databases, including GenBank, ENA, DDBJ, and GEO. Also included in the issue is an editorial advocating the creation of a “community-defined, uniform, generic description of the core attributes of biological databases,” which would be known as the BioDBCore checklist. Such a checklist would benefit both database users and provides: users would have a much easier time finding the appropriate resource and providers would be able to highlight specialized resources and the lesser known functionality of established databases.

Besides the databases reviewed in the current issue, Nucleic Acids Research maintains a select list of 1330 molecular biology databases that have been profiled in various database issues over the past 18 years.

SimplyMap! – Census and business data made easier

Online mapping and data access has become even easier with the launch of SimplyMap 2.0.  A long time favorite of Economics and Public Policy courses (and faculty) at Duke, this program provides a straight forward interface for web-based mapping and data extraction application that lets users create thematic maps and reports using US census, business, and marketing data.

Screenshot
SimplyMap 2.0 map interface

Version 2.0 includes improvements designed to make it easier to find and analyze data and create professional looking GIS-style thematic maps.

Significant changes include:

  • A new multi-tab interface to allow you to easily switch between your projects
  • Interactive wizards to guide you through making maps and reports
  • Can choose to automatically select the geographic unit displayed on a map based on the zoom level
  • Easier searching and browsing to choose data variables
  • Assign keyword tags to organize your maps and reports
  • Share your work with other users of SimplyMap (send a URL that lets them open a copy of your map or report)
  • Data filters (greater than, less than, etc.) can now be applied to both maps and reports
  • More export options: Data: Excel, DBF, CSV;  Maps: GIF, PDF, Shapefiles (boundaries only, no attributes)
  • Faster performance

Give SimplyMap 2.0 a try and let us know what you think.  Support is always available in Perkins Data and GIS.

Policy Paradox: Mapping Residential Restrictions

Do residential restrictions placed on convicted sex offenders serve to protect the public?  Duke Economics Ph.D. candidate Songman Kang, has been using the analytical capabilities of geographic information software to help determine the extent to which the restrictions affect residential locations of sex offenders: computing the area covered by a restriction and determining which offenders had to relocate due to a restriction.

According to Kang, the residential restrictions are designed to reduce recidivism among sex offenders and prevent their presence near places where children regularly congregate.  Neither of these claims has been found consistent with empirical evidence though, and it is unclear whether the restrictions have been successful in reducing the rates of repeat sex offenses.  On the other hand, the restrictions severely limit residential location choices, and may force offenders to relocate away from employment opportunities and supportive networks of family and friends.  As a result of the deteriorated economic conditions, the offenders who had to relocate may become more likely to commit non-sex offenses.

The following maps illustrate some of the restricted zones in Miami and in the Triangle area of North Carolina studied by Mr. Kang.

Figure 1: Residential Restricted Zones in Miami

Figure 2: Triangle Restricted Residences