Fall 2016 DVS Workshop Series

GenericWorkshops-01Data and Visualization Services is happy to announce its Fall 2016 Workshop Series. Learn new ways of enhancing your research with a wide range of data driven research methods, data tools, and data sources.

Can’t attend a session?  We record and share most of our workshops online.  We are also happy to consult on any of the topics above in person.  We look forward to seeing you in the workshops, in the library, or online!

Data Sources
 
Data Cleaning and Analysis
 
Data Analysis
Introduction to Stata (Two sessions: Sep 21, Oct 18)
 
Mapping and GIS
Introduction to ArcGIS (Two sessions: Sep 14, Oct 13)
ArcGIS Online (Oct 17)
 
Data Visualization

Visualizing Qualitative Data (Oct 19)
Visualizing Basic Survey Data in Tableau – Likert Scales (Nov 10)

Data Fest 2016 Workshop Series

Duke Libraries are happy to welcome the 2016 ASA DataFest to the Edge on April 1-3rd.  As part of  DataFest 2016, the Edge is hosting five DatadataFestFest related workshops designed to help teams and others interested in data driven research expand their skills.   All workshops will meet in the Edge Workshop Room (1st Floor Bostock Library).  Laptops are required for all workshops.

We wish all the teams success in the competition and hope to see you in the next few weeks!


 

DataFest Workshop Series

Data Analysis with Python
Tuesday, March 22
6:00-9:00 PM
This will be a hands-on class focused on performing data analysis with Python. We’ll help participants set-up their Jupyter Notebook development environment, cover the basic functions for reading and manipulating data, show examples of common statistical models and useful packages and show some of the python visualization tools.

Introduction to R
Wednesday March 23
6:00-8:00 PM
Introduction to R as a statistical programming language. This session will introduce the basics of R syntax, getting data into R, various data types and classes, etc. The session assumes no or little background in R.

Data Munging with R and dplyr
Monday, March 28
6:00-8:00 PM
This session will demonstrate tools for data manipulation and cleaning of data in R. Majority of the session will use the dplyr and tidyr packages. Some background in R is recommended. If you are not familiar with R, make sure to first attend the first R workshop in the series.

Data visualization with R, ggplot2, and shiny
Wednesday, March 30
6:00-8:00 PM
This session will demonstrate tools for static and interactive data visualization in R using ggplot2 and shiny packages. Some background in R is recommended. If you are not familiar with R, make sure to first attend the first R workshop in the series.

EDA and Interactive Predictive Modeling with JMP
Thursday, March 31
4:00-6:00 PM
JMP® Statistical Discovery Software is dynamic, visual and interactive desktop software for Windows and Mac. In this hands-on workshop we see tools for exploring, visualizing and preparing data in JMP. We’ll also learn how to fit a variety of predictive models, including multiple regression, logistic regression, classification and regression trees, and neural networks. A six month license of JMP will be provided.

2016 Student Data Visualization Contest Winners

Thanks to an earlier fall deadline, we are already ready to announce the winners of our fourth year of the Duke Student Data Visualization Contest.  The 14 visualizations submitted highlighted some very exciting visualization work being done by students of all ages here at Duke. The winners and other submissions to the contest will soon be featured on the Duke Data Visualization Flickr Gallery.

As in the past, the submissions were judged on the basis of five criteria: insightfulness, broad appeal, aesthetics, technical merit, and novelty.  The three winning submissions this year exemplify all of these and tell rich stories about three very different types of research projects. The winning submissions will be converted to larger poster versions and hung in the Brandaleone Lab for Data and Visualization Services (in the Edge).  Be on the look out later this semester for a reception to celebrate their hard work!  The winners will also receive Amazon gift cards.  We are very grateful to Duke University Libraries for their continuing support of the contest.

First place:

Global Flows of Agriculture and Forestry Feedstocks
Brandon Morrison, Ph.D. Candidate (Division of Earth & Ocean Sciences, NSOE)

2016 Data Visualization Contest-Morrison&Golden

Second place:

Feature Interpretations from Ground Penetrating Radar at Vulci, Italy
Katherine McCusker, Ph.D. Student (Art History)

McCusker_DataVisualization_Vulci_sm

Third place:

Simulated Sediment Deposition at Continental Margins
Candise Henry, Ph.D. Student (Division of Earth & Ocean Sciences, NSOE

henryc_figure

Please join us in celebrating the outstanding work of these students!

Data and Visualization Spring 2016 Workshops

Spring 2016 DVS WorkshopsSPRING 2016: Data and Visualization Workshops 

Interested in getting started in data driven research or exploring a new approach to working with research data?  Data and Visualization Services’ spring workshop series features a range of courses designed to showcase the latest data tools and methods.  Begin working with data in our Basic Data Cleaning/Analysis or the new Structuring Humanities Data  workshop.  Explore data visualization in the Making Data Visual class.  Our wide range of workshops offers a variety of approaches for the meeting the challenges of 21st century data driven research.   Please join us!

Workshop by Theme

DATA SOURCES

DATA CLEANING AND ANALYSIS

DATA ANALYSIS

MAPPING AND GIS

DATA VISUALIZATION

* – For these workshops, no prior experience with data projects is necessary!  These workshops are great introductions to basic data practices.

Duke welcomes artist/illustrator Jennifer McCormick

McCormick_PortraitOn the last day of classes, December 4, the Duke community will have a very special treat: a visit from artist and certified medical illustrator Jennifer McCormick.  Jennifer has been actively exhibiting and speaking about her work for several years, including a recent TEDx talk at Wake Forest University and an exhibit at the Durham Arts Council.

knee_combinedIn Jennifer’s work as a medical illustrator, she partners with attorneys to create visualizations that explain complex injuries and medical procedures to jury members.  In her fine art, however, she builds on the histories and x-rays of patients to explore “an opportunity for healing, hope, and acceptance.”  Her unique pieces transform the original clinical imagery of the injury into gorgeous, natural, holistic scenes.  In her artist talks, she speaks of “the power of intention” and “our forgotten superpowers” to raise awareness of the importance of art and spirituality for healing.

McCormick-6WEBJennifer will join us for the final Visualization Friday Forum of the semester.  It will be an opportunity for visualization enthusiasts, clinicians, medical imaging specialists, legal scholars, and those interested in the intersection between health and art to gather together for a presentation and conversation.  The talk will occur in the standard time slot for the Visualization Friday Forum — noon on Friday, December 4 — but the location is changing to accommodate a larger audience.  For one week only, we will meet in Duke Hospital Lecture Hall 2003.

The Visualization Friday Forum is sponsored by the Duke University Libraries (Data and Visualization Services), Duke Information Science + Studies (ISS), and the DiVE group. Jennifer’s visit will also be sponsored by the Trent Center for Bioethics, Humanities & History of Medicine and Duke Law – Academic Technologies.

We are so excited Jennifer has agreed to travel to Duke for a visit.  Please mark your calendars for this event.  If you would like to speak with Jennifer about medical illustrations or the intersection between medicine and spirituality, please contact Angela Zoss.

Enter the 2016 Student Data Visualization Contest

2016 Student Data Visualization ContestCalling all Duke undergrad and grad students! Have you worked on a course or research project that included some kind of visualization? Maybe you made a map for a history class paper. Maybe you invented a new type of chart to summarize the results of your experiment. Maybe you played around with an infographic builder just for fun.

Now is the time to start thinking about submitting those visualizations to the Duke Student Data Visualization Contest. It’s easy — just grab a screenshot or export an image of your visualization, write up a short description explaining how you made it, and submit it using our Sakai project site (search for “2016 DataVis Contest”). The deadline is right after finals this fall, so just block in a little extra time at the end of the semester once you’re done with your final assignments and projects.

Not sure what would work as a good submission? Check out our Flickr gallery with examples from the past two years.

Not sure if you’re eligible? If were a Duke student (that is, enrolled in a degree-granting program, so no post-docs) any time during 2015, and you did the work while you were a student, you’re golden!

Want to know more about the technical details and submission instructions? Check out the full contest instruction site.

Shapefiles vs. Geodatabases

Ever wonder what the difference between a shapefile and a geodatabase is in GIS and why each storage format is used for different purposes?  It is important to decide which format to use before beginning your project so you do not have to convert many files midway through your project.

Basics About Shapefiles:

Shapefiles are simple storage formats that have been used in ArcMap since the 1990s when Esri created ArcView (the early version of ArcMap 10.3).  Therefore, shapefiles have many limitations such as:

  • Takes up more storage space on your computer than a geodatabase
  • Do not support names in fields longer than 10 characters
  • Cannot store date and time in the same field
  • Do not support raster files
  • Do not store NULL values in a field; when a value is NULL, a shapefile will use 0 instead

Users are allowed to create points, lines, and polygons with a shapefile.  One shapefile must have at least 3 files but most shapefiles have around 6 files.  A shapefile must have:

  • .shp – this file stores the geometry of the feature
  • .shx – this file stores the index of the geometry
  • .dbf – this file stores the attribute information for the feature

All files for the shapefile must be stored in the same location with the same name or else the shapefile will not load.  When a shapefile is opened in Windows Explorer it will look different than when opened in ArcCatalog.

Shapefile_Windows

 

Basics About Geodatabases:

Geodatabases allow users to thematically organize their data and store spatial databases, tables, and raster datasets.  There are two types of single user geodatabases: File Geodatabase and Personal Geodatabase.  File geodatabases have many benefits including:

  • 1 TB of storage limits of each dataset
  • Better performance capabilities than Personal Geodatabase
  • Many users can view data inside the File Geodatabase while the geodatabase is being edited by another user
  • The geodatabase can be compressed which helps reduce the geodatabases’ size on the disk

On the other hand, Personal Geodatabases were originally designed to be used in conjunction with Microsoft Access and the Geodatabase is stored as an Access file (.mdb).  Therefore Personal Geodatabases can be opened directly in Microsoft Access, but the entire geodatabase can only have 2 GB of storage.

To organize your data into themes you can create Feature Datasets within a geodatabase.  Feature datasets store Feature Classes (which are the equivalent to shapefiles) with the same coordinate system.  Like shapefiles, users can create points, lines, and polygons with feature classes; feature classes also have the ability to create annotation, and dimension features.

Geodatabase

In order to create advanced datasets (such as add a network dataset, a geometric network, a terrain dataset, a parcel fabric, or run topology on an existing layer) in ArcGIS, you will need to create a Feature Dataset.

You will not be able to access any files of a File geodatabase in Windows Explorer.  When you do, the Durham_County geodatabase shown above will look like this:

Windows2

 

Tips:

  • When you copy shapefiles anytime, use ArcCatalog. If you use Windows Explorer and do not select all the files for a shapefile, the shapefile will be corrupt and will not load.
  • When using a geodatabase, use a File Geodatabase. There is more storage capacity, multiple users can view/read the database at the same time, and the file geodatabase runs tools and queries faster than a Personal Geodatabase.
  • Use a shapefile when you want to read the attribute table or when you have a one or two tools/processes you need to do. Long-term projects should be organized into a File Geodatabase and Feature Datasets.
  • Many files downloaded from the internet are shapefiles. To convert them into your geodatabase, right click the shapefile, click “Export,” and select “To Geodatabase (single).”

Export_Shp

Welcoming our new Data Visualization Analyst — Eric Monson

EMonson2Data and Visualization Services is proud and excited to welcome Eric Monson, Ph.D., our newest staff member. Eric joins the team as our Data Visualization Analyst, working with Angela Zoss to provide support for data visualization across Duke’s campus and community.

Eric worked for several years under the supervision of Rachael Brady, who was the head of the Visualization Technology Group (now the Visualization and Interactive Systems group), the founder of the DiVE, and a hub for the visualization community at Duke. Though transitioning from work in applied physics, Eric quickly became an active member of the broader visualization research community, sharing his experiences developing interactive visualization applications through online forums and professional organizations. His natural design sense contributes to an elegant portfolio of past work, and his work on projects in both the sciences and the humanities gives him an extremely wide range of experience with different datasets, tools, and techniques.

ipca_webSince DVS began offering visualization services in 2012, Eric has been an active supporter and collaborator. While continuing to work as a Research Scientist, Eric has co-organized the Visualization Friday Forum speaker series, teamed up with Angela on instructional sessions, and been an active supporter of visualization events and initiatives. He is an experienced and patient instructor and will bring many years of consulting experience to bear in this new role.

Over the past three years, demand for visualization support has steadily increased at Duke. With an active workshop series, guest lectures in a variety of courses, individual and small-group consultations, and programming such as the Student Data Visualization Contest, DVS is very happy to be able to boast two staff members with visualization expertise. In the near future, we hope to increase our visualization workshop offerings and continue to identify powerful but easy-to-use tools and techniques that will meet the needs of Duke visualizers. Taking advantage of Eric’s background in sciences and humanities, DVS looks forward to being able to answer a broader range of questions and offer a more diverse set of solutions.

Please join us in welcoming Eric to the team!  As always, feel free to contact askdata@duke.edu with any questions or data-driven research needs.

DVS Fall Workshops

GenericWorkshops-01Data and Visualization Services is happy to announce its Fall 2015 Workshop Series.  With a range of workshops covering basic data skills to data visualization, we have a wide range of courses for different interests and skill levels..  New (and redesigned) workshops include:

  • OpenRefine: Data Mining and Transformations, Text Normalization
  • Historical GIS
  • Advanced Excel for Data Projects
  • Analysis with R
  • Webscraping and Gathering Data from Websites

Workshop descriptions and registration information are available at:

library.duke.edu/data/news

 

Workshop
 

Date

OpenRefine: Data Mining and Transformations, Text Normalization
Sep 9
Basic Data Cleaning and Analysis for Data Tables
Sep 15
Introduction to ArcGIS
Sep 16
Easy Interactive Charts and Maps with Tableau
Sep 18
Introduction to Stata
Sep 22
Historical GIS
Sep 23
Advanced Excel for Data Projects
Sep 28
Easy Interactive Charts and Maps with Tableau
Sep 29
Analysis with R
Sep 30
ArcGIS Online
Oct 1
Web Scraping and Gathering Data from Websites
Oct 2
Advanced Excel for Data Projects
Oct 6
Basic Data Cleaning and Analysis for Data Tables
Oct 7
Introduction to Stata
Oct 14
Introduction to ArcGIS
Oct 15
OpenRefine: Data Mining and Transformations, Text Normalization
Oct 20
Analysis with R
Oct 20

 

ModelBuilder

Ever have trouble conceptualizing your project workflow?  ModelBuilder  allows you to plan your project before you run any tools.  When using ModelBuilder in ESRI’s ArcMap, you create a workflow of your project by adding the data and tools you need.  To open ModelBuilder, click the ModelBuilder icon     (MB_Icon) in the Standard Toolbar.

MBIcon

Key Points Before You Build Your Model

ModelBuilder can only be created and saved in a toolbox.  In order to create your model, you first need to create a new toolbox in the Toolboxes, MyToolboxes folders in ArcCatalog.  Once you have a new toolbox, you will need to create a new Model; to do this, right click your newly created toolbox and select New, then Model.  When you wish to open an existing ModelBuilder, find your toolbox, right click your Model and select Edit.

In order to find the results of your model and the data created in the middle of your project workflow (also known as intermediate data), you will need to direct the data to any workspace or a Scratch Geodatabase.  To set your data results to a Scratch Geodatabase in ModelBuilder, click Model, then Model Properties.  A dialog box will open and you will want to select the Environments tab, Workspace category, and check Scratch Workspace.  Before closing the dialog box, select “Values” and navigate to your workspace or your geodatabase.

Set_Workspace

Building and Running a Model

To create a model, click the Add Data or Tool button (AddData).  Navigate to the SystemToolboxes, find the tool you wish to run, and add it to your model.  Double click the tool within the Model and its parameters will open.  Fill out the appropriate fields for the tool and select OK.

When the tools or variables are ready for processing, they will be colored blue, green, or yellow.  Blue variables are inputs, yellow variables are tools, and green variables are outputs.  When there is an error or the parameters have not been chosen, the variables will have no color.

ModelBlog_Good

Once you have your model built, click the Run icon (MBRun) to run the model.  Depending on the data and the amount of tools you run, the Model can take seconds or minutes to run.  You can also run one tool at a time; to do this, right click the tool and select “Run.”  When the Model is done running, the tools and outputs will have a gray background.  To find the results of your model, navigate to the Scratch Workspace you have set and add the shapefile or table to ArcMap or right-click the output variable before running the model and select “Add to Display.”

Applying ModelBuilder

The model above demonstrates how to take nationwide county data, North Carolina landmark data and North Carolina major roads data and find landmarks in Wake County that are within 1 mile of major roads.  The first tool in the model (Select Layer by Attribute tool) extracts Wake County from the nationwide counties polygon layer. 1

Once Wake County is extracted to a new layer, the North Carolina landmarks layer is clipped to the Wake County layer using the Clip tool2 The result of this tool creates a landmarks point layer in Wake County.  The third tool uses the Buffer tool on the primary roads layer in North Carolina.  Within the Buffer tool parameters, a distance of 1 mile is chosen and a new polygon layer is created.

 

Finally, the Wake County landmarks layer is intersected with the buffered major roads layer to create a final output using the Interect tool.4  Using ModelBuilder has many benefits: you document the steps you used to create your project and you can easily rerun the tool with different inputs after the model is built.  ModelBuilder allows users to easily determine if and where problems in the workflow are.  When there is an error in the workflow, a “Failed to Execute” message will appear and tell users which tool was unable to execute.  ModelBuilder also lets users easily change parameters.  In the model used above, you could change the Expression in the Select Layer by Attribute tool from ‘Wake’ to ‘Durham’ and find landmarks within 1 mile of major roads in Durham County.