Tag Archives: rstats

Introducing Felipe Álvarez de Toledo, 2019-2020 Humanities Unbounded Digital Humanities Graduate Assistant

Felipe Álvarez de Toledo López-Herrera is a Ph.D. candidate at the Art, Art History, and Visual Studies Department at Duke University and a Digital Humanities Graduate Assistant for Humanities Unbounded, 2019-2020.  Contact him at askdata@duke.edu.

Over the 2019-2020 academic year, I am serving as a Humanities Unbounded graduate assistant in Duke Libraries’ Center for Data and Visualization Sciences. As one of the three Humanities Unbounded graduate assistants, I will partner on Humanities Unbounded projects and focus on developing skills that are broadly applicable to support humanities projects at Duke. In this blog post, I would like to introduce myself and give readers a sense of my skills and interests. If you think my profile could address some of the needs of your group, please reach out to me through the email above!

My own dissertation project began with a data dilemma. 400 years ago, paintings were shipped across the Atlantic by the thousands.  They were sent by painters and dealers in places like Antwerp or Seville, for sale in the Spanish colonies. But most of these paintings were not made to last. Cheap supports and shifting fashions guaranteed a constant renewal of demand, and thus more work for painters, in a sort of proto-industrial planned obsolescence.[1]As a consequence, the canvas, the traditional data point of art history, was not a viable starting point for my own research, rendering powerless many of the tools that art history has developed for studying painting. I was interested in examining the market for paintings as it developed in Seville, Spain from 1500-1700; it was a major productive center which held the idiosyncratic role of controlling all trade to the Spanish colonies for more than 200 years. But what could I do when most of the work produced within it no longer exists?

This problem drives my research here at Duke, where I apply an interdisciplinary, data-driven approach. My own background is the product of two fields: I obtained a bachelor’s degree in Economics in my hometown of Barcelona, Spain in 2015 from the Universitat Pompeu Fabra, and simultaneously attended art history classes in the University of Barcelona. This combination found a natural mid-way point in the study of art markets. I came to Duke to be a part of DALMI, the Duke, Art, Law and Markets Initiative, led by Professor Hans J. Van Miegroet, where I was introduced to the methodologies of data-driven art historical research.

Documents in Seville’s archives reveal a stunning diversity of production that encompasses the religious art for which the city is known, but also includes still lives, landscapes and genre scenes whose importance has been understated and of which few examples remain [Figures 1 & 2]. But analysis of individual documents, or small groups of them, yields limited information. Aggregation, with an awareness of the biases and limitations in the existing corpus of documents, seems to me a way to open up alternative avenues for research. I am creating a database of painters in the city of Seville from 1500-1699, where I pool known archival documentation relating to painters and painting in this city and extract biographical, spatial and productive data to analyze the industry. I explore issues such as the industry’s size and productive capacity, its organization within the city, reactions to historical change and, of course, its participation in transatlantic trade.

This approach has obliged me to become familiar with a wide range of digital tools. I use OpenRefine for cleaning data, R and Stata for statistical analysis, Tableau for creating visualizations and ArcGIS for visualizing and generating spatial data (see examples of my own work below [Figures 3-4]). I have also learned the theory behind relational databases and am learning to use MySQL for my own project; similarly, for the data-gathering process I am interested in learning data-mining techniques through machine learning. I have been using a user-friendly software called RapidMiner to simplify some of my own data gathering.

Thus, I am happy to help any groups that have a data set and want to learn how to visualize it graphically, whether through graphs, charts or maps. I am also happy to help groups think about their data gathering and storage. I like to consider data in the broadest terms: almost anything can be data, if we correctly conceptualize how to gather and utilize it realistically within the limits of a project. I would like to point out that this does not necessarily need to result in visualization; this is also applicable if a group has a corpus of documents that they want to store digitally. If any groups have an interest in text mining and relational databases, we can learn simultaneously—I am very interested in developing these skills myself because they apply to my own project.

I can:

  • Help you consider potential data sources and the best way to extract the information they contain
  • Help you make them usable: teach you to structure, store and clean your data
  • And of course, help you analyze and visualize them
    • With Tableau: for graphs and infographics that can be interactive and can easily be embedded into dashboards on websites.
    • With ArcGIS: for maps that can also be interactive and embedded onto websites or in their Stories function.
  • Help you plan your project through these steps, from gathering to visualization.

Once again, if you think any of these areas are useful to you and your project, please do not hesitate to contact me. I look forward to collaborating with you!

[1]Miegroet, Hans J. Van, and Marchi, ND. “Flemish Textile Trade and New Imagery in Colonial Mexico (1524-1646).” Painting for the Kingdoms. Ed. J Brown. Fomento Cultural BanaMex, Mexico City, 2010. 878-923.

 

Fall Data and Visualization Workshops

2017 Data and Visualization Workshops

Visualize, manage, and map your data in our Fall 2017 Workshop Series.  Our workshops are designed for researchers who are new to data driven research as well as those looking to expand skills with new methods and tools. With workshops exploring data visualization, digital mapping, data management, R, and Stata, the series offers a wide range of different data tools and techniques. This fall, we are extending our partnership with the Graduate School and offering several workshops in our data management series for RCR credit (please see course descriptions for further details).

Everyone is welcome at Duke Libraries workshops.  We hope to see you this fall!

Workshop Series by Theme

Data Management

09-13-2017 – Data Management Fundamentals
09-18-2017 – Reproducibility: Data Management, Git, & RStudio 
09-26-2017 – Writing a Data Management Plan
10-03-2017 – Increasing Openness and Reproducibility in Quantitative Research
10-18-2017 – Finding a Home for Your Data: An Introduction to Archives & Repositories
10-24-2017 – Consent, Data Sharing, and Data Reuse 
11-07-2017 – Research Collaboration Strategies & Tools 
11-09-2017 – Tidy Data Visualization with Python

Data Visualization

09-12-2017 – Introduction to Effective Data Visualization 
09-14-2017 – Easy Interactive Charts and Maps with Tableau 
09-20-2017 – Data Visualization with Excel
09-25-2017 – Visualization in R using ggplot2 
09-29-2017 – Adobe Illustrator to Enhance Charts and Graphs
10-13-2017 – Visualizing Qualitative Data
10-17-2017 – Designing Infographics in PowerPoint
11-09-2017 – Tidy Data Visualization with Python

Digital Mapping

09-12-2017 – Intro to ArcGIS Desktop
09-27-2017 – Intro to QGIS 
10-02-2017 – Mapping with R 
10-16-2017 – Cloud Mapping Applications 
10-24-2017 – Intro to ArcGIS Pro

Python

11-09-2017 – Tidy Data Visualization with Python

R Workshops

09-11-2017 – Intro to R: Data Transformations, Analysis, and Data Structures  
09-18-2017 – Reproducibility: Data Management, Git, & RStudio 
09-25-2017 – Visualization in R using ggplot2 
10-02-2017 – Mapping with R 
10-17-2017 – Intro to R: Data Transformations, Analysis, and Data Structures
10-19-2017 – Developing Interactive Websites with R and Shiny 

Stata

09-20-2017 – Introduction to Stata
10-19-2017 – Introduction to Stata 

 

 

 

 

 

 

 

 

 

 

 

 

Scaling Support: Designing Data for a Growing Statistics Program

r_stats101How do you support 57,860 online students learning R and statistics ?  Late last fall, Data and GIS Services shared this challenge with Professor Mine Çetinkaya-Rundel and the staff of CIT as we sought to translate Professor Çetinkaya-Rundel’s successful Statistics 101 course to a Coursera class on Data Analysis and Statistical Inference.  While Data and GIS Services has supported Statistics 101 students for several years identifying appropriate data and using the R statistical language for their assignments, the scale of the Coursera course introduced new challenges of trying to provide engaging data to a very large audience without having the opportunity to provide direct support to everyone in the class.

In our initial meetings with Professor Çetinkaya-Rundel, she requested that Data and GIS create data collections for the course that would provide easy access in R and would include a range of statistical measures that would appeal to the diverse audience in the class.  The first challenge — easy access to R — required some translation work.  While R excels in its flexibility, graphics, and statistical power, it lacks some of the built in data documentation features present in other statistical packages.  This project prompted Data and GIS to reconsider how to provide documentation and pre-formatted R data to an audience that would likely be unfamiliar with R and data documentation.

The second challenge — finding data that covered a wide range of interesting topics — proved much easier.  The General Social Survey with its diverse and engaging questions on a wide range of topics proved to be an easy choice for the class.  The American National Election Studies, also offered a diverse set of measures of public opinion that suited the course well.  With these challenges identified and addressed, we spent the end of 2013 selecting portions of the data for class (subsetting), abridging the data documentation for instructional use, and transforming the data to address its usage in an online setting (processing missing values for R, creating factor variables).

As Professor Çetinkaya-Rundel’s class launches on February 17th, this project has given us a new appreciation of providing data and statistical services in a MOOC while also building course materials that we are using in Statistics 101 at Duke.  While students begin the Coursera course on Data Analysis and Statistical Inference, students in Professor Kari Lock Morgan’s Statistics 101 class will use these data in their on-campus Duke course as well.  We hope that both collections will reduce some of the technological hurdles that often confront courses using R as well as improving statistical literacy at Duke and beyond.