Category Archives: Data Management

Data and Visualization Spring 2016 Workshops

Spring 2016 DVS WorkshopsSPRING 2016: Data and Visualization Workshops 

Interested in getting started in data driven research or exploring a new approach to working with research data?  Data and Visualization Services’ spring workshop series features a range of courses designed to showcase the latest data tools and methods.  Begin working with data in our Basic Data Cleaning/Analysis or the new Structuring Humanities Data  workshop.  Explore data visualization in the Making Data Visual class.  Our wide range of workshops offers a variety of approaches for the meeting the challenges of 21st century data driven research.   Please join us!

Workshop by Theme

DATA SOURCES

DATA CLEANING AND ANALYSIS

DATA ANALYSIS

MAPPING AND GIS

DATA VISUALIZATION

* – For these workshops, no prior experience with data projects is necessary!  These workshops are great introductions to basic data practices.

Shapefiles vs. Geodatabases

Ever wonder what the difference between a shapefile and a geodatabase is in GIS and why each storage format is used for different purposes?  It is important to decide which format to use before beginning your project so you do not have to convert many files midway through your project.

Basics About Shapefiles:

Shapefiles are simple storage formats that have been used in ArcMap since the 1990s when Esri created ArcView (the early version of ArcMap 10.3).  Therefore, shapefiles have many limitations such as:

  • Takes up more storage space on your computer than a geodatabase
  • Do not support names in fields longer than 10 characters
  • Cannot store date and time in the same field
  • Do not support raster files
  • Do not store NULL values in a field; when a value is NULL, a shapefile will use 0 instead

Users are allowed to create points, lines, and polygons with a shapefile.  One shapefile must have at least 3 files but most shapefiles have around 6 files.  A shapefile must have:

  • .shp – this file stores the geometry of the feature
  • .shx – this file stores the index of the geometry
  • .dbf – this file stores the attribute information for the feature

All files for the shapefile must be stored in the same location with the same name or else the shapefile will not load.  When a shapefile is opened in Windows Explorer it will look different than when opened in ArcCatalog.

Shapefile_Windows

 

Basics About Geodatabases:

Geodatabases allow users to thematically organize their data and store spatial databases, tables, and raster datasets.  There are two types of single user geodatabases: File Geodatabase and Personal Geodatabase.  File geodatabases have many benefits including:

  • 1 TB of storage limits of each dataset
  • Better performance capabilities than Personal Geodatabase
  • Many users can view data inside the File Geodatabase while the geodatabase is being edited by another user
  • The geodatabase can be compressed which helps reduce the geodatabases’ size on the disk

On the other hand, Personal Geodatabases were originally designed to be used in conjunction with Microsoft Access and the Geodatabase is stored as an Access file (.mdb).  Therefore Personal Geodatabases can be opened directly in Microsoft Access, but the entire geodatabase can only have 2 GB of storage.

To organize your data into themes you can create Feature Datasets within a geodatabase.  Feature datasets store Feature Classes (which are the equivalent to shapefiles) with the same coordinate system.  Like shapefiles, users can create points, lines, and polygons with feature classes; feature classes also have the ability to create annotation, and dimension features.

Geodatabase

In order to create advanced datasets (such as add a network dataset, a geometric network, a terrain dataset, a parcel fabric, or run topology on an existing layer) in ArcGIS, you will need to create a Feature Dataset.

You will not be able to access any files of a File geodatabase in Windows Explorer.  When you do, the Durham_County geodatabase shown above will look like this:

Windows2

 

Tips:

  • When you copy shapefiles anytime, use ArcCatalog. If you use Windows Explorer and do not select all the files for a shapefile, the shapefile will be corrupt and will not load.
  • When using a geodatabase, use a File Geodatabase. There is more storage capacity, multiple users can view/read the database at the same time, and the file geodatabase runs tools and queries faster than a Personal Geodatabase.
  • Use a shapefile when you want to read the attribute table or when you have a one or two tools/processes you need to do. Long-term projects should be organized into a File Geodatabase and Feature Datasets.
  • Many files downloaded from the internet are shapefiles. To convert them into your geodatabase, right click the shapefile, click “Export,” and select “To Geodatabase (single).”

Export_Shp

DVS Fall Workshops

GenericWorkshops-01Data and Visualization Services is happy to announce its Fall 2015 Workshop Series.  With a range of workshops covering basic data skills to data visualization, we have a wide range of courses for different interests and skill levels..  New (and redesigned) workshops include:

  • OpenRefine: Data Mining and Transformations, Text Normalization
  • Historical GIS
  • Advanced Excel for Data Projects
  • Analysis with R
  • Webscraping and Gathering Data from Websites

Workshop descriptions and registration information are available at:

library.duke.edu/data/news

 

Workshop
 

Date

OpenRefine: Data Mining and Transformations, Text Normalization
Sep 9
Basic Data Cleaning and Analysis for Data Tables
Sep 15
Introduction to ArcGIS
Sep 16
Easy Interactive Charts and Maps with Tableau
Sep 18
Introduction to Stata
Sep 22
Historical GIS
Sep 23
Advanced Excel for Data Projects
Sep 28
Easy Interactive Charts and Maps with Tableau
Sep 29
Analysis with R
Sep 30
ArcGIS Online
Oct 1
Web Scraping and Gathering Data from Websites
Oct 2
Advanced Excel for Data Projects
Oct 6
Basic Data Cleaning and Analysis for Data Tables
Oct 7
Introduction to Stata
Oct 14
Introduction to ArcGIS
Oct 15
OpenRefine: Data Mining and Transformations, Text Normalization
Oct 20
Analysis with R
Oct 20

 

ModelBuilder

Ever have trouble conceptualizing your project workflow?  ModelBuilder  allows you to plan your project before you run any tools.  When using ModelBuilder in ESRI’s ArcMap, you create a workflow of your project by adding the data and tools you need.  To open ModelBuilder, click the ModelBuilder icon     (MB_Icon) in the Standard Toolbar.

MBIcon

Key Points Before You Build Your Model

ModelBuilder can only be created and saved in a toolbox.  In order to create your model, you first need to create a new toolbox in the Toolboxes, MyToolboxes folders in ArcCatalog.  Once you have a new toolbox, you will need to create a new Model; to do this, right click your newly created toolbox and select New, then Model.  When you wish to open an existing ModelBuilder, find your toolbox, right click your Model and select Edit.

In order to find the results of your model and the data created in the middle of your project workflow (also known as intermediate data), you will need to direct the data to any workspace or a Scratch Geodatabase.  To set your data results to a Scratch Geodatabase in ModelBuilder, click Model, then Model Properties.  A dialog box will open and you will want to select the Environments tab, Workspace category, and check Scratch Workspace.  Before closing the dialog box, select “Values” and navigate to your workspace or your geodatabase.

Set_Workspace

Building and Running a Model

To create a model, click the Add Data or Tool button (AddData).  Navigate to the SystemToolboxes, find the tool you wish to run, and add it to your model.  Double click the tool within the Model and its parameters will open.  Fill out the appropriate fields for the tool and select OK.

When the tools or variables are ready for processing, they will be colored blue, green, or yellow.  Blue variables are inputs, yellow variables are tools, and green variables are outputs.  When there is an error or the parameters have not been chosen, the variables will have no color.

ModelBlog_Good

Once you have your model built, click the Run icon (MBRun) to run the model.  Depending on the data and the amount of tools you run, the Model can take seconds or minutes to run.  You can also run one tool at a time; to do this, right click the tool and select “Run.”  When the Model is done running, the tools and outputs will have a gray background.  To find the results of your model, navigate to the Scratch Workspace you have set and add the shapefile or table to ArcMap or right-click the output variable before running the model and select “Add to Display.”

Applying ModelBuilder

The model above demonstrates how to take nationwide county data, North Carolina landmark data and North Carolina major roads data and find landmarks in Wake County that are within 1 mile of major roads.  The first tool in the model (Select Layer by Attribute tool) extracts Wake County from the nationwide counties polygon layer. 1

Once Wake County is extracted to a new layer, the North Carolina landmarks layer is clipped to the Wake County layer using the Clip tool2 The result of this tool creates a landmarks point layer in Wake County.  The third tool uses the Buffer tool on the primary roads layer in North Carolina.  Within the Buffer tool parameters, a distance of 1 mile is chosen and a new polygon layer is created.

 

Finally, the Wake County landmarks layer is intersected with the buffered major roads layer to create a final output using the Interect tool.4  Using ModelBuilder has many benefits: you document the steps you used to create your project and you can easily rerun the tool with different inputs after the model is built.  ModelBuilder allows users to easily determine if and where problems in the workflow are.  When there is an error in the workflow, a “Failed to Execute” message will appear and tell users which tool was unable to execute.  ModelBuilder also lets users easily change parameters.  In the model used above, you could change the Expression in the Select Layer by Attribute tool from ‘Wake’ to ‘Durham’ and find landmarks within 1 mile of major roads in Durham County.

Sharing Files: Your Duke Box.com

Last fall Duke University released its newest file sharing service known as Duke’s Box.  By partnering with Box.comBox.com Logo, Duke offers a cloud-storage service which is intuitive, secure, and easy to use. Login with with your NetID, share files with colleagues, and have confidence this cloud storage is compliant with all laws and regulations regarding data privacy and security.

Simple to Use

Duke’s Box is similar to other cloud-based file storage services which support collaboration, productivity, and synchronization.  You can drop and drag files, identify collaborators and set permissions (read, edit, comment, etc.) But unlike some services, such as Dropbox or Google Drive, Duke’s Box enables you to be in compliance with data privacy and security. Additionally, you can synchronize data across your devices, at your discretion and subject to Duke’s Security & Usage Practice restrictions

While you may have previously used OIT’s NAS (Network Attached Storage) file storage service known as CIFS for data storage,  Duke’s Box is easier to use -although it provides services for slightly different use-cases. For example, CIFS might be more useful if accessing large files (e.g. video files that are larger than 5 GB). However, CIFS doesn’t enable collaboration or sharing.  Depending on your needs you may still want to use your departmental or OIT NAS.  Either way, you can use both file storage services and each service is free.

Check out this quick-start video:

50 GB of Space by Default

You are automatically provisioned 50 GB of space, but you can request more if you need more.  See the Comparison of Document Management & Collaboration Tools at Duke for details.

Individual file size limitations are throttled to less than 5 GB.  This means Duke’s Box may be less than ideal for sharing very large files. NAS services may be more appropriate for large files as the time to download or synchronize large files can become inconvenient.  But for many common file sharing cases, Duke’s Box is ideal, fast and convenient.

Documentation, Restrictions & Use

While you can store many types of files, there are best practices and restrictions you will want to review.  For example, Duke Medicine users are required to complete an online training module prior to account activation.

Sharing Your Data With Us

One of the many use-cases for Duke’s Box is a more convenient way for you to share your data with us.  As you know we welcome questions about data analysis and visualization. We know describing data can be difficult while sharing your dataset can clarify your question.   But sharing your data via email consumes a lot of resources — both yours and ours. Now there’s a better way; please share your data with us via Duke’s Box.

Steps for Sharing Your Data with DVS Consultants

How to Share your files - 5 second annimated loop

  1. Log into Duke’s Box  (Use the bluecontinuebutton) 
  2. Open your “homefolder
  3. Put your data in the “sharingfolder
  4. Use the “invite people” button (right-hand sidebar)
    • Using a consultant email address, invite the DVS Consultant to see your data.  (Don’t worry if you don’t have our email yet.  When you start your question at askData@duke.edu, an individual consultant will be back in touch.)

New Year- New Data and Visualization Lab!

Data and Visualization Services is happy to announce our new Data and Visualization Lab in Duke Libraries new Edge research space.  Located on the first floor of the Bostock Library, the Brandaleone Family Lab for Data and Visualization Services offers a dedicated space for researchers working on data driven projects.

The lab features three distinct areas for supporting data driven research.

Data and Visualization Lab Space

Data and Visualization Lab Computing Zone

Our lab space features twelve high end workstations with dual monitors with the latest software for data visualization, digital mapping, statistics, and qualitative research.  All of the machines have two dedicated displays to encourage collaborative work and data consultations.  Additionally, all twelve machines have a dedicated power port located conveniently under the edge of the table for powering a laptop or usb powered device.

Bloomberg Professional “Bar”

bloom

Since the launch of our Bloomberg terminals, we have seen a steady increase in both individual and team based usage of Bloomberg financial data.  Our three Bloomberg Professional workstations are now located on a dedicated “bar” across from our lab machines.  The  new Bloomberg zone will facilitate collaborate work and provide a base for groups such as the Duke University Investment Club and Duke Financial Economics Center.

Consult and Collaborative SpaceCollaboration Zone

Our third lab space provides a set of four rolling tables for small groups to collaborate or for projects that don’t require a fixed computing space.   An 85″ flat panel display near this zone features data visualizations and other data driven research projects at Duke.

Come See Us!

With ample natural light,  almost 24/7 availability, and a welcoming staff eager to work with you on your next data driven project.  We look forward to working with you in the upcoming year!

Meet Data and Visualization Services

Data and Visualization Services LogoThe fall of 2014 marks the completion of the first five years of the libraries’ Data and GIS Services Department. In 2009, when Mark Thomas and I formed the department, the name accurately reflected our staffing and services as Mark focused on GIS-related issues and I focused on data-related issues. As an increasing number of scholars have embraced data-driven research over the last five years , our services and staff have grown to support an increasingly diverse set of research needs at Duke.

In 2010-2011 academic year, the Libraries launched services around data management and sharing plans in anticipation of new funding rules surrounding research data. In 2012, the library expanded data services in collaboration with OIT’s Research Computing to offer one of the first data visualization consulting positions in the country. In 2013 and 2014, we expanded services and staff to include consultations on research computing and big data.

At this year’s Data and GIS Services annual retreat, we decided that the time has come to change the name of the department to reflect the broader range of staff and consulting services available. While we continue to support our traditional dimensions of data and GIS research, we intend to support a range of data needs across the following five themes:

Data and Visualization Services Themes

Data Sources
Get the data you need. Data and Visualization Services consultants can help you locate and license a diverse range of data sources.  We also provide long term storage for Duke data collections through Duke’s institutional repository.

Data Storage and Management
Need help on a data management plan, want advice on archiving, or struggling with “big data” analytics?  We are happy to consult!

Data Cleaning and Analysis
From Google Refine to the command line, we can help with data cleaning and analysis.

Mapping and GIS
Mapping and spatial analysis remain a core service for the data and visualization program.

Data Visualization
Our data visualization service can help with the most effective way to represent your data for both analysis and communication.

 

We appreciate the research community’s support as we’ve grown over the last five years.  We look forward to working with you on a larger range of data challenges in the future!

Data and GIS Services Spring 2014 Workshop Series

DGSwkshpExplore network analysis, text mining, online mapping, data visualization, and statistics in our spring 2014 workshop series.  Our workshops provide a chance to explore new tools or refresh your memory on effective strategies for managing digital research.  Interested in keeping up to date with workshops and events in Data and GIS?  Subscribe to the dgs-announce listserv or follow us on Twitter (@duke_data).

Currently Scheduled Workshops

 Thu, Jan 9 2:00 PM – 3:30 PM  Data Management Plans – Grants, Strategies, and Considerations

 Mon, Jan 13 2:00 PM – 3:30 PM Webinar: Social Science Data Management and Curation

 Mon, Jan 13 3:00 PM – 4:00 PM Google Fusion Tables

 Tue, Jan 14 3:00 PM – 4:00 PM Open (aka Google) Refine 

 Wed, Jan 15 1:00 PM – 3:00 PM Stata for Research

 Thu, Jan 16 3:00 PM – 5:00 PM Analysis with R

 Tue, Jan 21 1:00 PM – 3:00 PM Introduction to ArcGIS

 Wed, Jan 22 1:00 PM – 3:00 PM ArcGIS Online

 Wed, Jan 22 3:00 PM – 4:00 PM Open (aka Google) Refine 

 Mon, Jan 27 2:00 PM – 3:30 PM Introduction to Text Analysis

 Wed, Jan 29 1:00 PM – 3:00 PM Analysis with R

 Thu, Jan 30 2:00 PM – 4:00 PM Stata for Research

 Mon, Feb 3 1:00 PM – 2:00 PM  Data Visualization on the Web

 Mon, Feb 3 2:00 PM – 3:00 PM  Data Visualization on the Web (Advanced)

 Tue, Feb 11 2:00 PM – 4:00 PM Using Gephi for Network Analysis and Visualization

 Wed, Feb 12 1:00 PM – 3:00 PM Introduction to ArcGIS

 Tue, Feb 18 2:00 PM – 3:30 PM Introduction to Tableau Public 8

 Tue, Feb 25 1:00 PM – 3:00 PM ArcGIS Online

 Thu, Feb 27 1:00 PM – 3:00 PM Historical GIS

 Mon, Mar 3 2:00 PM – 3:30 PM  Designing Academic Figures and Posters

 Tue, Mar 4 1:00 PM – 3:00 PM  Useful R Packages: Extensions for Data Analysis, Management, and Visualization

Access your Duke-Cloud from ANYWHERE

Say you’ve been making hella maps or data stories all day. Now you need to move to your comfy work spot and you need your data to come with you.  If you use Duke’s CIFS, moving around is easy, and all of your files are already backed-up.

In this example we follow the researcher, Ms. Stu Fac-Staff.  Stu is part student, part faculty, and part staff at Duke University.  She needs a portable place for her data and wants easy access from her home, lab, and devices.  Stu also needs to easily share data with colleagues.  No problem!  Stu uses CIFS.

Here’s the scenario.  Ms. Stu Fac-Staff walks into the Data & GIS Lab in the Duke University Libraries with a flash drive full of data tables.  She gathers more supporting data and some advice about crunching the numbers.  Stu finishes her day with a visualization and map. (Proudly, Stu imagines this is going to get the A.  “Is this grant worthy?” Stu asks herself.  “You bet your NSF Application it is!”)  Meanwhile, her flash dive is now full and all she wants is to SAVE THE DATA, CONVENIENTLY for later retrieval back home. So Stu stores the data on the Duke Cloud (CIFS.)

How do I get the free CIFS Space and how much can I use/access?

How do I access the data from my device?

  • In the Data & GIS Lab, after using your NetID to login, open the Windows File Explorer and your CIFS space will be mapped as drive Z.
  • After you leave our Data & GIS Lab, all you have to do is “map the drive” on your own machine
  • Web – For easy distribution to colleagues, you might want to access or distribute your files through the web.  To do this, store the files in your ‘public_html‘ directory inside of your CIFS space.  Now the files can be downloaded via a web browser.  This method is, by default, open to the world; you may want to take additional steps to secure this public_html directory  (see below.)

    http://people.duke.edu/~NetID

     

Can I Secure the Data?

  • Are you trying to access your mapped drive from off campus?
    • Use the VPN directions
    • The CIFS protocol encrypts NetID/password but it does not encrypt your data stream over the Internet.  If you’re connecting from an unencrypted or untrusted network (e.g. wireless in the coffee shop), the VPN allows for a secure connection.
  • Did you put files in your public_html folder?
    • Unlike the default CIFS space, placing files in the ‘public_html’ directory means they become accessible to the world
    • You can control and limit access by following OIT’s “htaccess” instructions

Data and GIS Fall 2013 Newsletter

Analyze, discover, manage, map, and visualize your data with Duke Libraries Data and GIS Services.  Our team of five consultants provides a broad range of support in areas ranging from data analysis, data visualization, geographic information systems, financial data, statistical software and data storage and management.  Our lab provides 12 workstations with the latest data software and three Bloomberg Professional workstations nearly 24/7 for the Duke community.

Data and GIS Workshop Series

All are welcome to the Data and GIS Workshop Series.  Analyze, communicate, clean, map, represent and visualize your data with a wide range of workshops on data based research methods and tools.  Details and registration for each class are available at the links that follow.  (Interested in keeping up to date with workshops and events in Data and GIS?  Just go to https://lists.duke.edu/sympa/info/dgs-announce and click on the “Subscribe” link at the bottom left.)

    Tue, Sep 3, 2013      1:00 PM - 3:00 PM    Introduction to ArcGIS    
    Wed, Sep 4, 2013     10:00 AM - 11:30 AM   Stata for Research    
    Wed, Sep 11, 2013    10:00 AM - 11:00 AM   Open (aka Google) Refine     
    Thu, Sep 12, 2013     1:00 PM - 3:00 PM    Analysis with R    
    Tue, Sep 17, 2013     1:00 PM - 2:30 PM    Introduction to Tableau Public 8    
    Thu, Sep 19, 2013    10:00 AM - 11:00 AM   Google Fusion Tables    
    Mon, Sep 23, 2013     1:00 PM - 2:30 PM    Introduction to Tableau Public 8    
    Tue, Sep 24, 2013     1:00 PM - 2:30 PM    Stata for Research    
    Mon, Sep 30, 2013    10:00 AM - 11:00 AM   Top 10 Dos and Don'ts for Charts and Graphs    
    Mon, Sep 30, 2013     1:00 PM - 3:00 PM    Introduction to ArcGIS    
    Tue, Oct 8, 2013      1:00 PM - 2:30 PM    Introduction to Text Analysis    
    Thu, Oct 10, 2013     1:00 PM - 3:00 PM    ArcGIS Special Topics: Geocoding & Proximity Analysis    
    Thu, Oct 17, 2013     1:00 PM - 3:00 PM    Historical GIS    
    Mon, Oct 28, 2013     1:00 PM - 2:00 PM    Designing Academic Figures and Posters    
    Tue, Oct 29, 2013     1:00 PM - 3:00 PM    Web GIS Applications

Data and GIS also offers instruction tailored to courses or research teams. Please contact askdata@duke.edu to schedule a session!

Data Management

Data Management Planning – DMPTool – Get 24/7 online help for your next data management plan, including information about Duke resources available for your data work.

Statistical Software Updates

Explore all of our Data and GIS Lab resources on our site at http://library.duke.edu/data/about/lab.html or come visit us on the second floor of Perkins Library.

Job Opportunities in Data and GIS Services

Data & GIS Services is hiring!  We have two open positions for student web programmers interested in working on data visualization projects.  See the Library Student employment page (http://library.duke.edu/jobs/students.html) for more information on how to apply.  (The job can be found by searching for requisition number “DUL14-AMZ02”.)

New Data and Map Collections

CPS on Web (CPS Utilities Online)
CPS on Web is a set of utilities enabling you to access CPS data and documentation from this website.   You may make tables and graphs from the CPS data, download data extractions, make estimations, get summaries and statistical measures, search the documentation, and make your own variables as functions of the existing ones.

Global Financial Data
Global Financial Data is a collection of financial and economic data provided in ASCII or Excel format. Data includes: long-term historical indices on stock markets; Total Return data on stocks, bonds, and bills; interest rates; exchange rates; inflation rates; bond indices; commodity indices and prices; consumer price indices; gross domestic product; individual stocks; sector indices; treasury bill yields; wholesale price indices; and unemployment rates covering over 200 countries.

LandScan Global
The LandScan Global Population Database provides global population distribution in a gridded GIS format at 30 arc-second resolution (approximately 1×1 km cells). Oak Ridge National Laboratory developed modeling techniques to disaggregate and interpolate census data within administrative boundaries to create a GIS layer showing population distribution as accurately and as timely as possible. EastView provides this data to use in GIS software as a WMS (Web Mapping Service) or as a WCS (Web Coverage Service) to allow a user to incorporate population distribution into GIS mapping and analysis.

Contact Us

email: askdata@duke.edu
twitter: duke_data or duke_vis