Category Archives: Open Data

OSF@Duke: By the Numbers and Beyond

The Open Science Framework (OSF) is a data and project management platform developed by the Center for Open Science that is designed to support the entire research lifecycle. OSF has a variety of features including file management and versioning, integration with third-party tools, granular permissions and sharing capabilities, and communication functionalities. It also supports growing scholarly communication formats including preprints and preregistrations, which enable more open and reproducible research practices.

In early 2017, Duke University became a partner institution with the OSF. As a partner institution, Duke researchers can sign into the OSF using their NetID and affiliate a project with Duke, which allows it to be displayed on the Duke OSF page. After 2 years of supporting OSF for Institutions here at Duke, the Research Data Management (RDM) team wanted to gain a better perspective surrounding how our community was using the tool and their perceptions. 

As of March 10, 2019, Duke has 202 users that have signed into the system using their Duke credentials (and there are possibly more users that are authenticating using personal email accounts). Of these users, 177 total projects have been created and affiliated with Duke. Forty-six of these projects are public and 132 remain private. Duke users have also registered 80 Duke affiliated projects, 62 of which are public and 18 are embargoed. A registration is a time-stamped read-only copy of an OSF project that can be used to preregister a research design, to create registered reports for journals, or at the conclusion of a project to formally record the authoritative copy of materials.

But what do OSF users think of the tool and how are they using it within their workflows? A few power users shared their thoughts:

Optimizing research workflows: A number of researchers noted how the OSF has helped streamline their workflows through creating a “central place that everyone has access to.” OSF has helped “keeping track of the ‘right’ version of things” and “bypassing the situation of having different versioned documents in different places.” Additionally, the OSF has supported “documenting workflow pipelines.”

Facilitating collaboration: One of the key features of the OSF is that researchers, regardless of institutional affiliation, can contribute to a project and integrate the tools they already use. Matt Makel, Director of Research at TIP, explains how OSF supports his research – “I collaborate with many colleagues at other institutions. OSF solves the problem of negotiating which tools to use to share documents. Rather than switching platforms across (or worse, within) projects, OSF is a great hub for our productivity.”

Offering an end-to-end data management solution: Some research groups are also using OSF in multiple stages of their projects and for multiple purposes. As one researcher expressed – “My research group uses OSF for every project. That includes preregistration and archiving research materials, data, data management and analysis syntax, and supplemental materials associated with publications. We also use it to post preprints to PsyArXiv.”

It also surfaced that OSF supported an ideological perception regarding a shift in the norms of scholarly communication. As Elika Bergelson, Crandall Family Assistant Professor in Psychology and Neuroscience, aptly put it “Open science is the way of the future.” Here within Duke University Libraries, we aim to continue to support these shifting norms and the growing benefits of openness through services, platforms, and training.

To learn more about how the OSF might support your research, join us on April 3 from 10-11 am for hands-on OSF workshop. Register here: https://duke.libcal.com/event/4803444

If you have other questions about using the OSF in a project, the RDM team is available for consultations or targeted demonstrations or trainings for research teams. We also have an OSF project that can help you understand the basic features of the tool.

Contact askdata@duke.edu to learn more or request an OSF demonstration.

ArcGIS Open Data

What is Open Data?

Finding data can be challenging.  Organizations and government agencies can share their data with the public using ESRI’s ArcGIS Open Data, a centralized spatial data clearinghouse.  Since its inception last year, over 1,600 organizations have provided more than 22,000 open datasets to the public.  Open Data allows users to find and download data in different formats, including shapefiles, spreadsheets, and KML documents, as well as APIs (GeoJSON or Esri GeoServices) to call the data into your own application.  It also lets you create various types of charts.

Search_Open_Data

How to Find and Use Data

Open Data allows consumers to type in a geographic area or a topic of interest in a single search box.  Once you’ve found data that appears to be what you were looking for, you can use the data for GIS purposes or use a table to create charts and graphs.  If you are looking for GIS data, you can preview the spatial data before downloading by clicking the “Open in ArcGIS” icon.  This takes users to ArcGIS Online where they can create choropleth maps and interact with the attribute table.   Users interested in tabular data can filter it and create various types of charts.  If more analysis of the data is necessary, you can download it by clicking the “Download Dataset” icon; you are able to download the entire dataset or the filtered dataset you’ve been working with.

OpenData_Page

Tips

The Source and Metadata links below the “About” heading provide information about the data.  In-depth information such as descriptions, attributes, OpenDataAboutand how the data was collected are provided in these links.  Below the name of the dataset there are three tabs:  “Details,” “Table,” and “Charts.”  Under the “Details” tab there are three sections, the Description, Dataset Attributes, and Related Datasets sections.  The Dataset Attributes section outlines the fields found within the dataset and provides field type information, while the Related Datasets section provides links to other datasets that have similar geographies or topics to the dataset you’ve chosen.  In the “Table” tab, you can view and filter the entire table in the dataset and the “Charts” tab allows you to create different charts.

OpenDataDetailTo obtain the most updated dataset or other updated articles related to the dataset, users should subscribe to the dataset they are interested in.  To subscribe, copy the link provided into an RSS Reader.  For specific data source questions, feel free to ask the Data and Visualization Department at askdata@duke.edu.

Wrangle, Refine, and Represent

Data visualization and data management represented the core themes of the 2011 Computer Assisted Reporting (CAR) Conference that met in Raleigh from February 24-27.  Bringing together journalists, computer scientists, and faculty, the conference united a number of communities that share a common interest in gathering and representing empirical evidence online (and in print).

While the conference featured luminaries in data visualization (Amanda Cox, David Huynh , Michal Migurski, Martin Wattenberg) who gave sage advice on how to best represent data online, web based data visualization tools provided a central focus for the conference.

Notable tools that may be of interest to the Duke research (and teaching) community include:

DataWrangler – An interactive data cleaning tool much like Google Refine (see below)

Google Fusion Tables – “manage large collections of tabular data in the cloud” – Fusion tables provides convenient access to google’s data visualization and mapping services.  The service also allows groups to annotate data online.

Google Refine – Refine is primarily a data cleaning tool that simplifies the process of cleaning data for further processing or analysis.  While users of existing data management tools may not be convinced to leave their current data management tool, Refine provides a rich suite of tools that will likely attract many new converts.

Many Eyes – One of the premier online visualization tools hosted by IBM.  Visualizations range from pie charts to digital maps to text analysis.  Many Eye’s versatility is one of its key strengths.

Polymaps – Billed as a “javascript library for image- and vector-tiled maps” – Polymaps allows the creating of custom lightweight map services on the web.

SIMILE Project (Semantic Interoperability of Metadata and Information in unLike Environments) – The SIMILE Project is a collection of different research projects designed to “enhance inter-operability” among digital assets.  At the conference, the Exhibit Project received particular attention for its ability to produce data rich visualization with very little coding required.

Timeflow –  Presented by Sarah Cohen and designed by Martin Wattenberg- Timeflow provides a convenient application for visualizing temporal data.