Category Archives: Data Visualization

Introducing Felipe Álvarez de Toledo, 2019-2020 Humanities Unbounded Digital Humanities Graduate Assistant

Felipe Álvarez de Toledo López-Herrera is a Ph.D. candidate at the Art, Art History, and Visual Studies Department at Duke University and a Digital Humanities Graduate Assistant for Humanities Unbounded, 2019-2020.  Contact him at askdata@duke.edu.

Over the 2019-2020 academic year, I am serving as a Humanities Unbounded graduate assistant in Duke Libraries’ Center for Data and Visualization Sciences. As one of the three Humanities Unbounded graduate assistants, I will partner on Humanities Unbounded projects and focus on developing skills that are broadly applicable to support humanities projects at Duke. In this blog post, I would like to introduce myself and give readers a sense of my skills and interests. If you think my profile could address some of the needs of your group, please reach out to me through the email above!

My own dissertation project began with a data dilemma. 400 years ago, paintings were shipped across the Atlantic by the thousands.  They were sent by painters and dealers in places like Antwerp or Seville, for sale in the Spanish colonies. But most of these paintings were not made to last. Cheap supports and shifting fashions guaranteed a constant renewal of demand, and thus more work for painters, in a sort of proto-industrial planned obsolescence.[1]As a consequence, the canvas, the traditional data point of art history, was not a viable starting point for my own research, rendering powerless many of the tools that art history has developed for studying painting. I was interested in examining the market for paintings as it developed in Seville, Spain from 1500-1700; it was a major productive center which held the idiosyncratic role of controlling all trade to the Spanish colonies for more than 200 years. But what could I do when most of the work produced within it no longer exists?

This problem drives my research here at Duke, where I apply an interdisciplinary, data-driven approach. My own background is the product of two fields: I obtained a bachelor’s degree in Economics in my hometown of Barcelona, Spain in 2015 from the Universitat Pompeu Fabra, and simultaneously attended art history classes in the University of Barcelona. This combination found a natural mid-way point in the study of art markets. I came to Duke to be a part of DALMI, the Duke, Art, Law and Markets Initiative, led by Professor Hans J. Van Miegroet, where I was introduced to the methodologies of data-driven art historical research.

Documents in Seville’s archives reveal a stunning diversity of production that encompasses the religious art for which the city is known, but also includes still lives, landscapes and genre scenes whose importance has been understated and of which few examples remain [Figures 1 & 2]. But analysis of individual documents, or small groups of them, yields limited information. Aggregation, with an awareness of the biases and limitations in the existing corpus of documents, seems to me a way to open up alternative avenues for research. I am creating a database of painters in the city of Seville from 1500-1699, where I pool known archival documentation relating to painters and painting in this city and extract biographical, spatial and productive data to analyze the industry. I explore issues such as the industry’s size and productive capacity, its organization within the city, reactions to historical change and, of course, its participation in transatlantic trade.

This approach has obliged me to become familiar with a wide range of digital tools. I use OpenRefine for cleaning data, R and Stata for statistical analysis, Tableau for creating visualizations and ArcGIS for visualizing and generating spatial data (see examples of my own work below [Figures 3-4]). I have also learned the theory behind relational databases and am learning to use MySQL for my own project; similarly, for the data-gathering process I am interested in learning data-mining techniques through machine learning. I have been using a user-friendly software called RapidMiner to simplify some of my own data gathering.

Thus, I am happy to help any groups that have a data set and want to learn how to visualize it graphically, whether through graphs, charts or maps. I am also happy to help groups think about their data gathering and storage. I like to consider data in the broadest terms: almost anything can be data, if we correctly conceptualize how to gather and utilize it realistically within the limits of a project. I would like to point out that this does not necessarily need to result in visualization; this is also applicable if a group has a corpus of documents that they want to store digitally. If any groups have an interest in text mining and relational databases, we can learn simultaneously—I am very interested in developing these skills myself because they apply to my own project.

I can:

  • Help you consider potential data sources and the best way to extract the information they contain
  • Help you make them usable: teach you to structure, store and clean your data
  • And of course, help you analyze and visualize them
    • With Tableau: for graphs and infographics that can be interactive and can easily be embedded into dashboards on websites.
    • With ArcGIS: for maps that can also be interactive and embedded onto websites or in their Stories function.
  • Help you plan your project through these steps, from gathering to visualization.

Once again, if you think any of these areas are useful to you and your project, please do not hesitate to contact me. I look forward to collaborating with you!

[1]Miegroet, Hans J. Van, and Marchi, ND. “Flemish Textile Trade and New Imagery in Colonial Mexico (1524-1646).” Painting for the Kingdoms. Ed. J Brown. Fomento Cultural BanaMex, Mexico City, 2010. 878-923.

 

Introducing Duke Libraries Center for Data and Visualization Sciences

As data driven research has grown at Duke, Data and Visualization Services receives an increasing number of requests for partnerships, instruction, and consultations. These requests have deepened our relationships with researchers across campus such that we now regularly interact with researchers in all of Duke’s schools, disciplines, and interdepartmental initiatives.

In order to expand the Libraries commitment to partnering with researchers on data driven research at Duke, Duke University Libraries is elevating the Data and Visualization Services department to the Center for Data and Visualization Sciences (CDVS). The change is designed to enable the new Center to:

  • Expand partnerships for research and teaching
  • Augment the ability of the department to partner on grant, development, and funding opportunities
  • Develop new opportunities for research, teaching, and collections – especially in the areas of data science, data visualization, and GIS/mapping research
  • Recognize the breadth and demand for the Libraries expertise in data driven research support
  • Enhance the role of CDVS activities within Bostock Libraries’ Edge Research Commons

We believe that the new Center for Data and Visualization Sciences will enable us to partner with an increasingly large and diverse range of data research interests at Duke and beyond through funded projects and co-curricular initiatives at Duke. We look forward to working with you on your next data driven project!

Expanding Support for Data Visualization in Duke Libraries

Angela ZossOver the last six years, Data and Visualization Services (DVS) has expanded support for data visualization in the Duke community under the expert guidance of Angela Zoss. In this period, Angela developed Duke University Libraries’ visualization program through a combination of thoughtful consultations, training, and events that expanded the community of data visualization practice at Duke while simultaneously increasing the impact of Duke research.

As of May 1st, Duke Libraries is happy to announce that Angela will expand her role in promoting data visualization in the Duke community by transitioning to a new position in the library’s Assessment and User Experience department. In her new role, Angela will support a larger effort in Duke Libraries to increase data-driven decision making. In Data and Visualization Services, Eric Monson will take the lead on research consultation and training for data visualization in the Duke community. Eric, who has been a data visualization analyst with DVS since 2015 and has a long history of supporting data visualization at Duke, will serve as DVS’ primary contact for data visualization.

DVS wishes Angela success in her new position. We look forward to continuing to work with the Duke community to expand data visualization research on campus.

Using Tableau with Qualtrics data at Duke

Logos for Qualtrics and TableauThe end of the spring semester always brings presentations of final projects, some of which may have been in the works since the fall or even the summer. Tableau, a software application designed specially for visualization, is a great option for projects that would benefit from interactive charts and maps.

Visualizing survey data, however, can be a bit of a pain. If your project uses Qualtrics, for example, you may be having trouble getting the data ready for visualization and analysis. Qualtrics is an extremely powerful survey tool, but the data it creates can be very complicated, and typical data analysis tools aren’t designed to handle that complexity.

Luckily, here at Duke, Tableau users can use Tableau’s Web Data Connector to pull Quatrics data directly into Tableau! It’s so easy, you may never analyze your Qualtrics data another way again.

Process

Here are the basics. There are also instructions from Qualtrics.

In Qualtrics: Copy your survey URLScreenshot of Tableau URL in Qualtrics

  • Go to your Duke Qualtrics account
  • Click on the survey of interest
  • Click on the Data & Analysis tab at the top
  • Click on the Export & Import button
  • Select Export Data
  • Click on Tableau
  • Copy the URL

In Tableau (Public or Desktop): Paste your survey URL

Tableau Web Data Connection

  • Under Connect, click on Web Data Connector (may be under “More…” for Tableau Public or “To a server… More…” for Tableau Desktop)
  • Paste the survey URL into the web data connector URL box and hit enter/return
  • When a login screen appears, click the tiny “Api Token Login” link, which should be below the green Log in button

In Qualtrics: Create and copy your API token

Generate Qualtrics API Token

  • Go to your Duke Qualtrics account
  • Click on your account icon in the upper-right corner
  • Select Account Settings…
  • On the Account Settings page, click on the Qualtrics IDs tab
  • Under API, check for a token. If you don’t have one yet, click on Generate Token
  • Copy your token

In Tableau (Public or Desktop): Paste your API token

  • Paste in your API token and click the Login button
  • Select the data fields you would like to import

Note: there is an option to “transpose” some of the fields on import. This is useful for many of the types of visualizations you might want to create from survey data. Typically, you want to transpose fields that represent the questions asked in the survey, but you may not want to transpose demographics data or identifiers. See also the Qualtrics tips on transposing data.

Resources

For more tips on how to use Tableau with Qualtrics data, check out the resources below:

Can’t we just make a Venn diagram?

When I’m teaching effective visualization principles, one of the most instructive processes is critiquing published visualizations and reviewing reworks done by professionals. I’ll often show examples from Cole Nussbaumer Knaflic’s blog, Storytelling with Data, and Jon Schwabish’s blog, The Why Axis. Both brilliant! (Also, check out my new favorite blog Uncharted, by Lisa Charlotte Rost, for wonderful visualization discussions and advice!)

What we don’t usually get to see is the progression of an individual visualization throughout the design process, from data through rough drafts to final product. I thought it might be instructive to walk through an example from one of my recent consults. Some of the details have been changed because the work is unpublished and the jargon doesn’t help the story.

Data full of hits and misses

Five tests data, hits and misses per patient A researcher came to me for help with an academic paper figure. He and his collaborator were comparing five literature-accepted methods for identifying which patients might have a certain common disease from their medical records. Out of 60,000 patients, about 10% showed a match with at least one of the tests. The resulting data was a spreadsheet with a column of patient IDs, and five columns of tests, with a one if a patient was identified as having the disease by that particular test, and a zero if their records didn’t match for that test. As you can see in the figure, there were many inconsistencies between who seemed to have the disease across the five tests!

So you want to build a snowman

Five tests overlap, original Venn diagram The researchers wanted a visualization to represent the similarities and differences between the test results. Specifically, they wanted to make a Venn diagram, which consists of ellipsoids representing overlapping sets. They had an example they’d drawn by hand, but wanted help making it into an accurate depiction of their data. I resisted, explaining that I didn’t know of a program that would accomplish what he wanted, and that it is likely to be mathematically impossible to take their five-dimensional data set and represent it quantitatively as a Venn diagram in 2D. Basically, you can’t get the areas of all of the overlapping regions to be properly proportional to the number of patients that had hits on all of the combinations of the five tests. The Venn diagram works fine schematically, as a way to get across an idea of set overlap, but it would never be a data visualization that would reveal quantitative patterns from their findings. At worst, it would be a misleading distortion of their results.

Count me in

Five tests data pairwise table with colored cells His other idea was to show the results as a table of numbers in the form of a matrix. Each of the five tests were listed across the top and the side, and the cell contents showed the quantity of patients who matched on that pair of tests. The number matching on a single test was listed on the diagonal. Those patterns can be made more visual by coloring the cells with “conditional formatting” in Excel, but the main problem with the table is that it hides a bunch of interesting data! We don’t see any of the numbers for people who hit on the various combinations of three tests, or four, or the ones that hit on all five.

Five test data heatmap and number of tests hit per patient

I suggested we start exploring the hit combinations by creating a heatmap of the original data, but sort the patients (along the horizontal axis) by how many tests tests they hit (listing the tests up the vertical axis). Black circles are overlaid showing the number of tests hit for any given patient.

There are too many patients / lines here to show clearly the combinations of tests, but this visualization already illuminated two things that made sense to the researchers. First, there is a lot of overlap between ALPHA (a) and BETA-based (b) tests, and between GAMMA Method (c) and Modified GAMMA (d), because these test pairs are variations of each other. Second, the overlaps indicate a way the definitions are logically embedded in each other; (a) is a subset of (b), and (b) is for the most part a subset of (c).

Five tests combinations Tableau bubble plot

My other initial idea was to show the numbers of patients identified in each of the test overlaps as bubbles in Tableau. Here I continue the shorthand of labeling each test by the letters [a,b,c,d,e], ordered from ALPHA to OMEGA. The number of tests hit are both separated in space and encoded in the color (low to high = light to dark).

Add some (effective?) dimensions

I felt the weakness of this representation was that the bubbles were not spatially associated with their corresponding tests. Inspired by multi-dimensional radial layouts such as those used in the Stanford dissertation browser, I created a chart (in Adobe Illustrator) with five axes for the tests. I wanted each bubble to “feel a pull” from each of the passed tests, so it made sense to place the five-hit “abcde” bubble at the center, and each individual, “_b___”, “__c__”, ____e” bubble right by its letter – making the radius naturally correspond to the number of test hit. Other bubbles were placed (manually) in between their combination of axes / tests.

Five test combinations hits polar bubble plot

The researchers liked this version. It was eye-catching, and the gravitation of bubbles in the b/c quadrant vaguely illustrated the pattern of hits and known test subsets. One criticism, though, was that it was a bit confusing – it wasn’t obvious how the bubbles were placed around the circles, and it might take people too long to figure out how to read the plot. It also, took up a lot of page space.

Give these sets a hand

One of the collaborators, after seeing this representation, suggested trying to use it as the basis for an Euler diagram. Like a Venn diagram, it’s a visualization used to show set inclusion and overlap, but unlike in a Venn, an Euler is drawn using arbitrary shapes surrounding existing labels or images representing the set members. I thought it was an interesting idea, but I initially dismissed the idea as too difficult. I had already put more time than I typically spend on a consult into this visualization (our service model is to help people learn how to make their own visualizations, not produce visualizations for them). Also, I had never made an Euler diagram. While I had seen some good talks about them, I didn’t have any software on hand which would automate the process. So, I responded that the researchers should feel free to try creating curves around the sets themselves, but I wasn’t interested in pursuing it further.

Five tests polar bubbles with hand-drawn set boundaries About two minutes after I sent the email, I began looking at the diagram and wondering if I could draw the sets! I printed out a black and white copy and started drawing lines with colored pencils, making one enclosing shape for each test [a-e]. It turned out that my manual layout resulted in fairly compact curves, except for “_bc_e”, which had ambiguous positioning, anyway. Five tests first draft Euler diagram The curve drawing was so easy that I started an Illustrator version. I kept the circles’ area the same (corresponding to their quantitative value) but pushed them around to make the set shapes more compact.

Ironically, I had come back almost exactly to the researchers’ original idea! The important distinction is that the bubbles keep it quantitative, with the regions only representing set overlap.

We’ve come full ellipsoid

Angela Zoss constructively pointed out that there were now too many colors, and the shades encoding number of hits wasn’t necessary. She also felt the region labels weren’t clear. Those fixes, plus some curve smoothing (Path -> Simplify in Illustrator) led me to a final version we were all very happy with!

It’s still not a super simple visualization, but both the quantitative and set overlap patterns are reasonably clear. This results was only possible, though, through trying multiple representations and getting feedback on each!

Five tests final quantitative bubble Euler diagram

If you’re interested in learning how to create visualizations like this yourself, sign up for the DVS announcements listserve, or keep an eye on our upcoming workshops list. We also have videos of many past workshops, including Angela’s Intro to Effective Data Visualization, and my Intro to Tableau, Illustrator for Charts, and Illustrator for Diagrams.

Fall Data and Visualization Workshops

2017 Data and Visualization Workshops

Visualize, manage, and map your data in our Fall 2017 Workshop Series.  Our workshops are designed for researchers who are new to data driven research as well as those looking to expand skills with new methods and tools. With workshops exploring data visualization, digital mapping, data management, R, and Stata, the series offers a wide range of different data tools and techniques. This fall, we are extending our partnership with the Graduate School and offering several workshops in our data management series for RCR credit (please see course descriptions for further details).

Everyone is welcome at Duke Libraries workshops.  We hope to see you this fall!

Workshop Series by Theme

Data Management

09-13-2017 – Data Management Fundamentals
09-18-2017 – Reproducibility: Data Management, Git, & RStudio 
09-26-2017 – Writing a Data Management Plan
10-03-2017 – Increasing Openness and Reproducibility in Quantitative Research
10-18-2017 – Finding a Home for Your Data: An Introduction to Archives & Repositories
10-24-2017 – Consent, Data Sharing, and Data Reuse 
11-07-2017 – Research Collaboration Strategies & Tools 
11-09-2017 – Tidy Data Visualization with Python

Data Visualization

09-12-2017 – Introduction to Effective Data Visualization 
09-14-2017 – Easy Interactive Charts and Maps with Tableau 
09-20-2017 – Data Visualization with Excel
09-25-2017 – Visualization in R using ggplot2 
09-29-2017 – Adobe Illustrator to Enhance Charts and Graphs
10-13-2017 – Visualizing Qualitative Data
10-17-2017 – Designing Infographics in PowerPoint
11-09-2017 – Tidy Data Visualization with Python

Digital Mapping

09-12-2017 – Intro to ArcGIS Desktop
09-27-2017 – Intro to QGIS 
10-02-2017 – Mapping with R 
10-16-2017 – Cloud Mapping Applications 
10-24-2017 – Intro to ArcGIS Pro

Python

11-09-2017 – Tidy Data Visualization with Python

R Workshops

09-11-2017 – Intro to R: Data Transformations, Analysis, and Data Structures  
09-18-2017 – Reproducibility: Data Management, Git, & RStudio 
09-25-2017 – Visualization in R using ggplot2 
10-02-2017 – Mapping with R 
10-17-2017 – Intro to R: Data Transformations, Analysis, and Data Structures
10-19-2017 – Developing Interactive Websites with R and Shiny 

Stata

09-20-2017 – Introduction to Stata
10-19-2017 – Introduction to Stata 

 

 

 

 

 

 

 

 

 

 

 

 

Fall 2016 DVS Workshop Series

GenericWorkshops-01Data and Visualization Services is happy to announce its Fall 2016 Workshop Series. Learn new ways of enhancing your research with a wide range of data driven research methods, data tools, and data sources.

Can’t attend a session?  We record and share most of our workshops online.  We are also happy to consult on any of the topics above in person.  We look forward to seeing you in the workshops, in the library, or online!

Data Sources
 
Data Cleaning and Analysis
 
Data Analysis
Introduction to Stata (Two sessions: Sep 21, Oct 18)
 
Mapping and GIS
Introduction to ArcGIS (Two sessions: Sep 14, Oct 13)
ArcGIS Online (Oct 17)
 
Data Visualization

Visualizing Qualitative Data (Oct 19)
Visualizing Basic Survey Data in Tableau – Likert Scales (Nov 10)

2016 Student Data Visualization Contest Winners

Thanks to an earlier fall deadline, we are already ready to announce the winners of our fourth year of the Duke Student Data Visualization Contest.  The 14 visualizations submitted highlighted some very exciting visualization work being done by students of all ages here at Duke. The winners and other submissions to the contest will soon be featured on the Duke Data Visualization Flickr Gallery.

As in the past, the submissions were judged on the basis of five criteria: insightfulness, broad appeal, aesthetics, technical merit, and novelty.  The three winning submissions this year exemplify all of these and tell rich stories about three very different types of research projects. The winning submissions will be converted to larger poster versions and hung in the Brandaleone Lab for Data and Visualization Services (in the Edge).  Be on the look out later this semester for a reception to celebrate their hard work!  The winners will also receive Amazon gift cards.  We are very grateful to Duke University Libraries for their continuing support of the contest.

First place:

Global Flows of Agriculture and Forestry Feedstocks
Brandon Morrison, Ph.D. Candidate (Division of Earth & Ocean Sciences, NSOE)

2016 Data Visualization Contest-Morrison&Golden

Second place:

Feature Interpretations from Ground Penetrating Radar at Vulci, Italy
Katherine McCusker, Ph.D. Student (Art History)

McCusker_DataVisualization_Vulci_sm

Third place:

Simulated Sediment Deposition at Continental Margins
Candise Henry, Ph.D. Student (Division of Earth & Ocean Sciences, NSOE

henryc_figure

Please join us in celebrating the outstanding work of these students!

Duke welcomes artist/illustrator Jennifer McCormick

McCormick_PortraitOn the last day of classes, December 4, the Duke community will have a very special treat: a visit from artist and certified medical illustrator Jennifer McCormick.  Jennifer has been actively exhibiting and speaking about her work for several years, including a recent TEDx talk at Wake Forest University and an exhibit at the Durham Arts Council.

knee_combinedIn Jennifer’s work as a medical illustrator, she partners with attorneys to create visualizations that explain complex injuries and medical procedures to jury members.  In her fine art, however, she builds on the histories and x-rays of patients to explore “an opportunity for healing, hope, and acceptance.”  Her unique pieces transform the original clinical imagery of the injury into gorgeous, natural, holistic scenes.  In her artist talks, she speaks of “the power of intention” and “our forgotten superpowers” to raise awareness of the importance of art and spirituality for healing.

McCormick-6WEBJennifer will join us for the final Visualization Friday Forum of the semester.  It will be an opportunity for visualization enthusiasts, clinicians, medical imaging specialists, legal scholars, and those interested in the intersection between health and art to gather together for a presentation and conversation.  The talk will occur in the standard time slot for the Visualization Friday Forum — noon on Friday, December 4 — but the location is changing to accommodate a larger audience.  For one week only, we will meet in Duke Hospital Lecture Hall 2003.

The Visualization Friday Forum is sponsored by the Duke University Libraries (Data and Visualization Services), Duke Information Science + Studies (ISS), and the DiVE group. Jennifer’s visit will also be sponsored by the Trent Center for Bioethics, Humanities & History of Medicine and Duke Law – Academic Technologies.

We are so excited Jennifer has agreed to travel to Duke for a visit.  Please mark your calendars for this event.  If you would like to speak with Jennifer about medical illustrations or the intersection between medicine and spirituality, please contact Angela Zoss.

Enter the 2016 Student Data Visualization Contest

2016 Student Data Visualization ContestCalling all Duke undergrad and grad students! Have you worked on a course or research project that included some kind of visualization? Maybe you made a map for a history class paper. Maybe you invented a new type of chart to summarize the results of your experiment. Maybe you played around with an infographic builder just for fun.

Now is the time to start thinking about submitting those visualizations to the Duke Student Data Visualization Contest. It’s easy — just grab a screenshot or export an image of your visualization, write up a short description explaining how you made it, and submit it using our Sakai project site (search for “2016 DataVis Contest”). The deadline is right after finals this fall, so just block in a little extra time at the end of the semester once you’re done with your final assignments and projects.

Not sure what would work as a good submission? Check out our Flickr gallery with examples from the past two years.

Not sure if you’re eligible? If were a Duke student (that is, enrolled in a degree-granting program, so no post-docs) any time during 2015, and you did the work while you were a student, you’re golden!

Want to know more about the technical details and submission instructions? Check out the full contest instruction site.