Category Archives: biology

Catching up on computational biology resources

With the arrival of summer, now is great time to catch up on these resources in computation biology and bioinformatics:

BioStar: Have a question on bioinformatics, computational genomics and biological data analysis but not sure who to ask? Try BioStar, which is an online open community of biologists ready to answer questions, even from “newbies”. You are also welcome to answer and comment on the questions. The more you do, the more reputation points you can earn toward your BioStar badge.

OpenHelix: The site provides a searchable collection of tutorials,  training materials, and exercises on the most popular genomic resources. The folks at OpenHelix also contract with resource providers to offer onsite, hands-on workshops at institutions. While most of their tutorials and training materials require a subscription, they do provide a suite of free tutorials, including ones on the UCSC Genome Browser and the RCSB Protein Data Bank.

Database: The Journal of Biological Databases and Data Curation: While maybe not beach reading, Database is a nice complement to the Nucleic Acids Research annual database issue. This open-access journal, launched in 2009, aims to provide a “platform for the presentation of novel ideas in database research and biocuration, and aims to help strengthen the bridge between database developers, curators, and users.”

Have a computation biology resource you would like to recommend? Please leave a comment.

Swimming in a Sea of Data

This post comes from Erika Kociolek, a second year Master in Environmental Management student at the Nicholas School.  The Data and GIS staff want to congratulate Erika on successfully defending her project!

For about 4 months, I’ve been swimming in a proverbial sea of data related to hypoxia (low dissolved oxygen concentrations) and landings in the Gulf of Mexico brown shrimp fishery.  I’m a second year master of environmental management (MEM) student at the Nicholas School, focusing on Environmental Economics and Policy.  I’ve been working with my advisor, Dr. Lori Bennear, to complete my master’s project (MP), an analysis attempting to estimate the effect of hypoxia  on landings and other economic outcomes of interest.

To do this, we are using data from the Southeast Monitoring and Assessment Program (SEAMAP), NOAA/NMFS, and a database of laws and policies related to brown shrimp that I compiled in Fall 2010.  By running regressions that difference out all variation in catch except for that attributable to hypoxia, we can isolate its effect on economic outcomes of interest.  I’ve found that catch, revenue, catch per unit effort, and revenue per unit effort are all larger in the presence of summer hypoxia.  However, if we look at catch for different sizes of shrimp, we see that in the presence of summer hypoxia, catch of larger shrimp decreases and catch of smaller shrimp increases significantly.

Getting to the point of discussing results has required a bunch of data analysis, cleaning, management, and visualization.  I used R, STATA, ArcGIS, and have even used video editing software to make dynamic graphics representing my results that have improved my own understanding of the raw data.  As an example, the video below, showing the change in hypoxia over time (1997-2004), was created using ArcGIS 10.

http://youtu.be/2YfYBE_Fe7U

Note: The maps in the video above use data from the Southeast Monitoring and Assessment Program (SEAMAP).

Hypoxia is a dynamic and complex phenomenon, varying in severity, over time, and in space; hypoxia in Gulf waters is more severe and widespread in summer.  The model I’m using actually takes advantage of this variation to obtain an estimate of the effect of hypoxia on catch and other economic outcomes.  To show people the source of variation I’m exploiting, I created this video.  These maps are drawing on data of dissolved oxygen concentrations and displaying it spatially.

We have dissolved oxygen measurements for most of the Gulf in the summer (June) and fall (December).  Each subarea-depth zone (see related map) that changes from salmon shading (not hypoxic) to red (hypoxic), or vice-versa, is variation in hypoxia that the models I’m running use to get an estimate of the hypothesized effect.

Many thanks are due to my advisor, Dr. Bennear, as well as to the helpful folks at the Data/GIS lab, who have provided invaluable assistance with the data management and data visualization components of this project!

This research was funded by NOAA’s National Center for Coastal Ocean Science, Award #NA09NOS4780235.

Surveying Our Researchers

Understanding library users’ research goals remains a key element of the Perkins Library’s Strategic Plan.  As part of the Library’s User Studies Initiative, Teddy Gray surveyed the Biology Department in the Fall of 2010 to discover what tools and resources departmental members use in their research, researchers’ data management needs, and the impact of the BES Library closing in 2009.

DATA AND DATA MANAGEMENT IN BIOLOGY
From the 18 interviews of faculty, graduate students, postdocs, and lab managers, we learned–not surprisingly–that nearly all the interviewees use data in their research, most of which they generate themselves. Half incorporate data from others into their work with nearly a third using sequence data from GenBank. Out of the 12 interviewees who generate data in their labs, two-thirds archive their data in existing repositories.

In addition to the interviews, this survey also examined research articles produced by Duke Biologists from 2009 in which we paid special attention to their methods sections and citation patterns. From analyzing departmental research articles, we found out the nearly 40% of the authors deposited their research data into either GenBank or a journal archive. Only one author deposited data into another existing scientific repository. Again nearly 40% of the authors used a general statistical package in their work (SAS and R being the most popular), while nearly half used a biology-specific statistical tool.

THE (RISE?) PREVALENCE of R
Almost everyone interviewed uses statistical tools in their research with over half now using R. Many also use biology-specific statistical programs.

PRINT VERSUS ELECTRONIC
All but one of the interviewees prefer the online versions of library material over the print. A third use image databases–primarily Google Images–in their teaching and presentations; however, only one interviewee knew of subject specific image databases such as the Biology Image Library. And while some interviewees missed the convenience of easy shelf browsing with the BES Library so close by, all are happy with the daily document delivery to the building.

FINAL THOUGHTS
We are grateful to the Biology Department for their support (and time) in conducting this survey and plan to use the results as the basis for library services.  Data and GIS Services is always interested in hearing more from Duke researchers about the nature of your research! Please let us know if you would like to discuss your research interest and/or library needs.

What’s hot in molecular biology databases

The journal Nucleic Acids Research has just published its 18th annual database issue. The current issue summarizes 96 new and 83 previously reviewed molecular biology databases, including GenBank, ENA, DDBJ, and GEO. Also included in the issue is an editorial advocating the creation of a “community-defined, uniform, generic description of the core attributes of biological databases,” which would be known as the BioDBCore checklist. Such a checklist would benefit both database users and provides: users would have a much easier time finding the appropriate resource and providers would be able to highlight specialized resources and the lesser known functionality of established databases.

Besides the databases reviewed in the current issue, Nucleic Acids Research maintains a select list of 1330 molecular biology databases that have been profiled in various database issues over the past 18 years.