All posts by Joel Herndon, Ph.D.

Fall 2020 – CDVS Research and Education During COVID-19

The Center for Data and Visualization Sciences is glad to welcome you back to a new academic year! We’re excited to have friends and colleagues returning to the Triangle and happy to connect with Duke community members who will not be on campus this fall.

This fall, CDVS will expand its existing online consultations with a new chat service and new online workshops for all members of the Duke community. Since mid-March, CDVS staff have redesigned instructional sessions, constructed new workflows for accessing research data, and built new platforms for accessing data tools virtually. We look forward to connecting with you online and working with you to achieve your research goals.

In addition to our expanded online tools and instruction, we have redesigned our CDVS-Announce data newsletter to provide a monthly update of data news, events, and workshops at Duke. We hope you will consider subscribing.

Upcoming Virtual CDVS Workshops

CDVS continues to offer a full workshops series for the latest strategies and tools for data focused research. Upcoming workshops for early September include:

R for data science: getting started, EDA, data wrangling
Thursday, Sep 1, 2020 10am – 12pm
This workshop is part of the Rfun series. R and the Tidyverse are a data-first coding language that enables reproducible workflows. In this two-part workshop, you’ll learn the fundamentals of R, everything you need to know to quickly get started. You’ll learn how to access and install RStudio, how to wrangle data for analysis, gain a brief introduction to visualization, practice Exploratory Data Analysis (EDA), and how to generate reports.
Register: https://duke.libcal.com/event/6867861

Research Data Management 101
Wednesday, Sep 9, 2020 10am – 12pm
This workshop will introduce data management practices for researchers to consider and apply throughout the research lifecycle. Good data management practices pertaining to planning, organization, documentation, storage and backup, sharing, citation, and preservation will be presented using examples that span disciplines. During the workshop, participants will also engage in discussions with their peers on data management concepts as well as learn about how to assess data management tools.
Register: https://duke.libcal.com/event/6874814

R for Data Science: Visualization, Pivot, Join, Regression
Wednesday, Sep 9, 2020 1pm – 3pm
This workshop will introduce data management practices for researchers to consider and apply throughout the research lifecycle. Good data management practices pertaining to planning, organization, documentation, storage and backup, sharing, citation, and preservation will be presented using examples that span disciplines. During the workshop, participants will also engage in discussions with their peers on data management concepts as well as learn about how to assess data management tools.
Register: https://duke.libcal.com/event/6867914

ArcGIS StoryMaps
Thursday, September 10, 2020 1pm – 2:30pm
This workshop will help you get started telling stories with maps on the ArcGIS StoryMaps platform. This easy-to-use web application integrates maps with narrative text, images, and videos to provide a powerful communication tool for any project with a geographic component. We will explore the capabilities of StoryMaps, share best practices for designing effective stories, and guide participants step-by-step through the process of creating their own application.
Register: https://duke.libcal.com/event/6878545

Assignment Tableau: Intro to Tableau work-together
Friday, September 11, 2020 10am – 11:30am
Work together over Zoom on an Intro to Tableau assignment. Tableau Public (available for both Windows and Mac) is incredibly useful free software that allows individuals to quickly and easily explore their data with a wide variety of visual representations, as well as create interactive web-based visualization dashboards. Attendees are expected to watch Intro to Tableau Fall 2019 online first, or have some experience with Tableau. This will be an opportunity to work together on the assignment from the end of that workshop, plus have questions answered live.
Register: https://duke.libcal.com/event/6878629

2020 RStudio Conference Livestream Coming to Duke Libraries

RStudio 2020 Conference LogoInterested in attending the 2020 RStudio Conference, but unable to travel to San Francisco? With the generous support of RStudio and the Department of Statistical Science, Duke Libraries will host a livestream of the annual RStudio conference starting on Wednesday, January 29th at 11AM. See the latest in machine learning, data science, data visualization, and R. Registration links and information about sessions follow. Registration is required for the first session and keynote presentations.  Please see the links in the agenda that follows.

Wednesday, January 29th

Location: Rubenstein Library 249 – Carpenter Conference Room

11:00 – 12:00 RStudio Welcome – Special Live Opening Interactive Event for Watch Party Groups
12:00 – 1:00 Welcome for Hadley Wickham and Opening Keynote – Open Source Software for Data Science (JJ Allaire)
1:00 – 2:00 Data, visualization, and designing with AI (Fernanda Viegas and Martin Wattenberg, Google)
2:30 – 4:00 Education Track (registration is not required)
Meet you where you R – Lauren Chadwick, R Studio.
Data Science Education in 2022 (Karl Howe and Greg Wilson, R Studio)
Data science education as an economic and public health intervention in East Baltimore (Jeff Leek, Johns Hopkins)
Of Teacups, Giraffes, & R Markdown (Desiree Deleon, Emory)

Location: Edge Workshop Room – Bostock 127

5:15 – 6:45 All About Shiny  (registration is not required)
Production-grade Shiny Apps with golem (Colin Fay, ThinkR)
Making the Shiny Contest (Duke’s own Mine Cetinkaya-Rundel)
Styling Shiny Apps with Sass and Bootstrap 4(Joe Cheng, RStudio)
Reproducible Shiny Apps with shinymeta (Carson Stewart, RStudio)
7:00 – 8:30 Learning and Using R (registration is not required)
Learning and using R: Flipbooks (Evangeline Reynolds, U Denver)
Learning R with Humorous Side Projects (Ryan Timpe, Lego Group)
Toward a grammar of psychological Experiments (Danielle, Navaro, University of New South Wales)
R for Graphical Clinical Trial Reporting(Frank Harrell, Vanderbilt)

Thursday, January 30th

Location: Edge Workshop Room – Bostock 127

12:00 – 1:00 Keynote: Object of type closure is not subsettable (Jenny Bryan, RStudio)
1:23 – 3:00 Data Visualization Track (registration is not required)
The Glamour of Graphics (William Chase, University of Pennsylvania)
3D ggplots with rayshader (Dr. Tyler Morgan-Wall, Institute for Defense Analyses)
Designing Effective Visualizations (Miriah Meyer, University of Utah)
Tidyverse 2019-2020 (Hadley Wickham, RStudio)
3:00 – 4:00 Livestream of Rstudio Conference Sessions (registration is not required)
4:00 – 5:30 Data Visualization Track 2 (registration is not required)
Spruce up your ggplot2 visualizations with formatted text (Claus Wilke, UT Austin)
The little package that could: taking visualizations to the next level with the scales package (Dana Seidel, Plenty Unlimited)
Extending your ability to extend ggplot2 (Thomas Lin Pedersen, RStudio)
5:45 – 6:30 Career Advice for Data Scientists Panel Discussion (registration is not required)
7:00 – 8:00 Keynote: NSSD Episode 100 (Hillary Parker, Stitchfix and Roger Peng, JHU)

Introducing Duke Libraries Center for Data and Visualization Sciences

As data driven research has grown at Duke, Data and Visualization Services receives an increasing number of requests for partnerships, instruction, and consultations. These requests have deepened our relationships with researchers across campus such that we now regularly interact with researchers in all of Duke’s schools, disciplines, and interdepartmental initiatives.

In order to expand the Libraries commitment to partnering with researchers on data driven research at Duke, Duke University Libraries is elevating the Data and Visualization Services department to the Center for Data and Visualization Sciences (CDVS). The change is designed to enable the new Center to:

  • Expand partnerships for research and teaching
  • Augment the ability of the department to partner on grant, development, and funding opportunities
  • Develop new opportunities for research, teaching, and collections – especially in the areas of data science, data visualization, and GIS/mapping research
  • Recognize the breadth and demand for the Libraries expertise in data driven research support
  • Enhance the role of CDVS activities within Bostock Libraries’ Edge Research Commons

We believe that the new Center for Data and Visualization Sciences will enable us to partner with an increasingly large and diverse range of data research interests at Duke and beyond through funded projects and co-curricular initiatives at Duke. We look forward to working with you on your next data driven project!

Expanding Support for Data Visualization in Duke Libraries

Angela ZossOver the last six years, Data and Visualization Services (DVS) has expanded support for data visualization in the Duke community under the expert guidance of Angela Zoss. In this period, Angela developed Duke University Libraries’ visualization program through a combination of thoughtful consultations, training, and events that expanded the community of data visualization practice at Duke while simultaneously increasing the impact of Duke research.

As of May 1st, Duke Libraries is happy to announce that Angela will expand her role in promoting data visualization in the Duke community by transitioning to a new position in the library’s Assessment and User Experience department. In her new role, Angela will support a larger effort in Duke Libraries to increase data-driven decision making. In Data and Visualization Services, Eric Monson will take the lead on research consultation and training for data visualization in the Duke community. Eric, who has been a data visualization analyst with DVS since 2015 and has a long history of supporting data visualization at Duke, will serve as DVS’ primary contact for data visualization.

DVS wishes Angela success in her new position. We look forward to continuing to work with the Duke community to expand data visualization research on campus.

Fall Data and Visualization Workshops

2017 Data and Visualization Workshops

Visualize, manage, and map your data in our Fall 2017 Workshop Series.  Our workshops are designed for researchers who are new to data driven research as well as those looking to expand skills with new methods and tools. With workshops exploring data visualization, digital mapping, data management, R, and Stata, the series offers a wide range of different data tools and techniques. This fall, we are extending our partnership with the Graduate School and offering several workshops in our data management series for RCR credit (please see course descriptions for further details).

Everyone is welcome at Duke Libraries workshops.  We hope to see you this fall!

Workshop Series by Theme

Data Management

09-13-2017 – Data Management Fundamentals
09-18-2017 – Reproducibility: Data Management, Git, & RStudio 
09-26-2017 – Writing a Data Management Plan
10-03-2017 – Increasing Openness and Reproducibility in Quantitative Research
10-18-2017 – Finding a Home for Your Data: An Introduction to Archives & Repositories
10-24-2017 – Consent, Data Sharing, and Data Reuse 
11-07-2017 – Research Collaboration Strategies & Tools 
11-09-2017 – Tidy Data Visualization with Python

Data Visualization

09-12-2017 – Introduction to Effective Data Visualization 
09-14-2017 – Easy Interactive Charts and Maps with Tableau 
09-20-2017 – Data Visualization with Excel
09-25-2017 – Visualization in R using ggplot2 
09-29-2017 – Adobe Illustrator to Enhance Charts and Graphs
10-13-2017 – Visualizing Qualitative Data
10-17-2017 – Designing Infographics in PowerPoint
11-09-2017 – Tidy Data Visualization with Python

Digital Mapping

09-12-2017 – Intro to ArcGIS Desktop
09-27-2017 – Intro to QGIS 
10-02-2017 – Mapping with R 
10-16-2017 – Cloud Mapping Applications 
10-24-2017 – Intro to ArcGIS Pro

Python

11-09-2017 – Tidy Data Visualization with Python

R Workshops

09-11-2017 – Intro to R: Data Transformations, Analysis, and Data Structures  
09-18-2017 – Reproducibility: Data Management, Git, & RStudio 
09-25-2017 – Visualization in R using ggplot2 
10-02-2017 – Mapping with R 
10-17-2017 – Intro to R: Data Transformations, Analysis, and Data Structures
10-19-2017 – Developing Interactive Websites with R and Shiny 

Stata

09-20-2017 – Introduction to Stata
10-19-2017 – Introduction to Stata 

 

 

 

 

 

 

 

 

 

 

 

 

New Data Management Services @ Duke

Data ManagementDuke Libraries are happy to announce a new set of research data management services designed to help researchers secure grant funding, increase research impact, and preserve valuable data. Building on the recommendations of the Digital Research Faculty Working Group and the Duke Digital Research Data Services and Support report, Data and Visualization Services have added two new research data management consultants who are available to work with researchers across the university and medical center on a broad range of data management concerns from data creation to data curation.

Interested in learning more about data management?

Our New Data Management Consultants

sophialh2Sophia Lafferty-Hess attended the University of North Carolina at Chapel Hill where she received a Master of Science in Information Science and Master of Public Administration. Prior to coming to Duke, Sophia worked at the Odum Institute for Research in Social Science at UNC-Chapel Hill within the Data Archive as a Research Data Manager. In this position, Sophia provided consultations to researchers on data management best practices, curated research data to support long-term preservation and reuse, and provided training and instruction on data management policies, strategies, and tools.

While at Odum, Sophia also helped lead the development of a data curation and verification service for journals to help enforce data sharing and replication policies, which included verifying that data meet quality standards for reuse and that the data and code can properly reproduce the analytic results presented in the article. Sophia’s current research interests include the impact of journal data sharing policies on data availability and the development of data curation workflows.

jen2Jen Darragh comes to us from Johns Hopkins University where she served for the past seven years as the Data Services and Sociology Librarian, and Hopkins Population Center Restricted Projects Coordinator.  In this position, Jen  developed the libraries’ Restricted Data Room and designed the secure data enclave spaces and staff support for the Johns Hopkins Population Center.

Jen received her Bachelor of Arts Degree in Psychology from Westminster College (PA) and her Master of Library and Information Sciences degree from the University of Pittsburgh.  She has been involved with socio-behavioral research data throughout her career.  Jen is particularly interested in the development of centralized, controlled data access for sensitive human subjects’ data (subject to HIPAA or FERPA requirements) to facilitate broader, yet more secure sharing of existing research data as a means to produce new, cutting-edge research.

 

Duke Libraries and SSRI welcome Mara Sedlins!

On behalf of Duke Libraries and the Social Science Research Institute, I am happy to welcome Mara Sedlins to Duke.  As the library and SSRI work to develop a rich set of data management, analysis, and archiving strategies for Duke researchers, Mara’s postdoctoral position provides a unique opportunity to work closely with researchers across campus to improve both training and workflows for data curation at Duke.  – Joel Herndon, Head of Data and Visualization Services, Duke Libraries  

2016-08-25 11.06.17 HDRI am excited to join the Data and Visualization Services team this fall as a postdoctoral fellow in data curation for the social sciences (sponsored by CLIR and funded by the Alfred P. Sloan Foundation). For the next two years, I will be working with Duke Libraries and the Social Science Research Institute to develop best practices for managing a variety of research data in the social sciences.

My research background is in social and personality psychology. I received my PhD at the University of Washington, where I worked to develop and validate a new measure of automatic social categorization – to what extent do people, automatically and without conscious awareness, sort faces into socially constructed categories like gender and race? The measure has been used in studies examining beliefs about human genetic variation and the racial labels people assign to multiracial celebrities like President Barack Obama.

While in Seattle, I was also involved in several projects at Microsoft Research assessing computer-supported cooperative work technologies, focusing on people’s preferences for different types of avatar representations, compared to video or audio-only conferencing. I also have experience working with data from a study of risk factors for intimate partner violence, managing a database of donors and volunteers for a historical archive, and organizing thousands of high-resolution images for a large-scale digital comic art restoration project.

I look forward to applying the insights gained from working on a diverse array of data-intensive projects to the problem of developing and promoting best practices for data management throughout the research lifecycle.  I am particularly interested in questions such as:

  • How can researchers write actionable data management plans that improve the quality of their research?
  • What strategies can be used to organize and document data files during a project so that it’s easy to find and understand them later?
  • What steps need to be taken so that data can be discovered and re-used effectively by other researchers?

These are just a few of the questions that are central to the rapidly evolving field of data curation for the sciences and beyond.

 

Fall 2016 DVS Workshop Series

GenericWorkshops-01Data and Visualization Services is happy to announce its Fall 2016 Workshop Series. Learn new ways of enhancing your research with a wide range of data driven research methods, data tools, and data sources.

Can’t attend a session?  We record and share most of our workshops online.  We are also happy to consult on any of the topics above in person.  We look forward to seeing you in the workshops, in the library, or online!

Data Sources
 
Data Cleaning and Analysis
 
Data Analysis
Introduction to Stata (Two sessions: Sep 21, Oct 18)
 
Mapping and GIS
Introduction to ArcGIS (Two sessions: Sep 14, Oct 13)
ArcGIS Online (Oct 17)
 
Data Visualization

Visualizing Qualitative Data (Oct 19)
Visualizing Basic Survey Data in Tableau – Likert Scales (Nov 10)

Data Fest 2016 Workshop Series

Duke Libraries are happy to welcome the 2016 ASA DataFest to the Edge on April 1-3rd.  As part of  DataFest 2016, the Edge is hosting five DatadataFestFest related workshops designed to help teams and others interested in data driven research expand their skills.   All workshops will meet in the Edge Workshop Room (1st Floor Bostock Library).  Laptops are required for all workshops.

We wish all the teams success in the competition and hope to see you in the next few weeks!


 

DataFest Workshop Series

Data Analysis with Python
Tuesday, March 22
6:00-9:00 PM
This will be a hands-on class focused on performing data analysis with Python. We’ll help participants set-up their Jupyter Notebook development environment, cover the basic functions for reading and manipulating data, show examples of common statistical models and useful packages and show some of the python visualization tools.

Introduction to R
Wednesday March 23
6:00-8:00 PM
Introduction to R as a statistical programming language. This session will introduce the basics of R syntax, getting data into R, various data types and classes, etc. The session assumes no or little background in R.

Data Munging with R and dplyr
Monday, March 28
6:00-8:00 PM
This session will demonstrate tools for data manipulation and cleaning of data in R. Majority of the session will use the dplyr and tidyr packages. Some background in R is recommended. If you are not familiar with R, make sure to first attend the first R workshop in the series.

Data visualization with R, ggplot2, and shiny
Wednesday, March 30
6:00-8:00 PM
This session will demonstrate tools for static and interactive data visualization in R using ggplot2 and shiny packages. Some background in R is recommended. If you are not familiar with R, make sure to first attend the first R workshop in the series.

EDA and Interactive Predictive Modeling with JMP
Thursday, March 31
4:00-6:00 PM
JMP® Statistical Discovery Software is dynamic, visual and interactive desktop software for Windows and Mac. In this hands-on workshop we see tools for exploring, visualizing and preparing data in JMP. We’ll also learn how to fit a variety of predictive models, including multiple regression, logistic regression, classification and regression trees, and neural networks. A six month license of JMP will be provided.

Data and Visualization Spring 2016 Workshops

Spring 2016 DVS WorkshopsSPRING 2016: Data and Visualization Workshops 

Interested in getting started in data driven research or exploring a new approach to working with research data?  Data and Visualization Services’ spring workshop series features a range of courses designed to showcase the latest data tools and methods.  Begin working with data in our Basic Data Cleaning/Analysis or the new Structuring Humanities Data  workshop.  Explore data visualization in the Making Data Visual class.  Our wide range of workshops offers a variety of approaches for the meeting the challenges of 21st century data driven research.   Please join us!

Workshop by Theme

DATA SOURCES

DATA CLEANING AND ANALYSIS

DATA ANALYSIS

MAPPING AND GIS

DATA VISUALIZATION

* – For these workshops, no prior experience with data projects is necessary!  These workshops are great introductions to basic data practices.