April 2021 - Duke Libraries Center for Data and Visualization Sciences

Author: Dr. Kaylee P. Alexander
Website: www.kayleealexander.com
Twitter: @kpalex91

Since the summer of 2018 I have been working with a set of nineteenth-century commercial almanacs for the city of Paris. As my dissertation focused heavily on the production of stone funerary markers during this period, I wanted to consult these almanacs to get a sense of how many workers were active in this field of production. Classified as marbriers (stonecutters), the makers of funerary monuments were often one in the same as those who executed various other stone goods and constructions. These almanacs represented a tremendous source of industry information, consistently recording enterprise names and addresses, as well as, at times, specific information about the types of products the enterprise specialized in and any awards they might have won for their work, and what types of new technologies they employed. An so I decided to make a database.

As a Humanities Unbounded graduate assistant with the Center for Data and Visualization Sciences during the summer of 2020, I had the opportunity to explore some of the issues related to database construction and management faced by humanists working on data-based research projects. In order to work out some of these issues, I worked to set up a MySQL database using my commercial almanacs data as a test case to determine which platforms and methods for creating queryable databases would be best suited for those working primarily in the humanities. In the process of setting up this database what became increasingly clear was the need to make clear the usefulness of this process for other humanists undertaking data-driven projects, as well as identify ways of transforming single spreadsheets of data into relational data models without needing to know how to code. Thus, in this blog post I offer a few key points about relational database models that may be useful for scholars in the humanities and share my experiences in constructing a MySQL database from a single Excel spreadsheet.

First of all, some key terms and concepts. MySQL is an open-source relational database management system that uses SQL (Structured Query Language) to create, modify, manage and extract data from a relational database. A relational data model organizes data into a series of tables containing columns (‘attributes’) and rows (‘records’) with unique keys identifying each record. Each table (or, ‘relation’) represents a single entity type and its corresponding attributes. When working with a relational data model you want to make sure that your tables are normalized, or organized in such a way that reduces redundancy in the data set, increases consistency, and facilitates querying.

Although for the purposes of efficient data gathering, I had initially collected all of the information from the commercial almanacs in a single Excel spreadsheet, I knew that I ultimately wanted to reconfigure my data using a relational model that could be shared with and queried efficiently by others. The main benefits of a relational model include ensuring consistency as well as performing combinations of queries to understand various relationships that exist among the information contained in the various tables that would be otherwise difficult to determine from a single spreadsheet. An additional benefit to this system is the ability to add records and edit information without the risk of compromising other information contained in the database.

The first question I needed to ask myself was which entities from my original spreadsheet would become the basis for my relational database. In other words, how would my relational model be organized? What different relationships existed within the dataset, and which variables functioned as entities rather than attributes? One of the key factors in determining which variables would become the entities of my relational model, was the question of whether or not a given variable contained repeated values throughout the master sheet. Ultimately determining the entities wasn’t the trickiest part. It because rather clear early on that it would be best to first create separate tables for businesses and business locations, which would be related via a table for annual activity (in the process of splitting my tables I would end up making more tables, but these were the key starting points).

The most difficult question I encountered was how I would go about splitting all this information as someone with very limited coding experience. How could identify unique values and populate tables with relevant attributes without having to teach myself Python in a pinch, but also without having to retype over one hundred years’ worth of business records? Ultimately, I came up with a rather convoluted system that had me going back and forth between OpenRefine and Excel. Once I got the hang of my system it became almost second nature to me but explaining it to others was another story. This made it abundantly clear that there was a lack of resources for demonstrating how one could create what were essentially a series of normalized tables from a flat data model. So, to make a very long story short, I broke down my convoluted process into a series of simple steps that required nothing more that Excel to transform a flat data model into a relational data model using the UNIQUE() and VLOOKUP() functions. This processes is detailed in a library tutorial I developed geared towards humanists, consisting of both a video demonstrating the process and a PDF containing written instructions.

In the end, all I needed to do was construct the database itself. In order to do this I worked with phpMyAdmin, a free web-based user interface for constructing and querying MySQL databases. Using phpMyAdmin, I was able to easily upload my normalized data tables, manage and query my database, and easily connect to Tableau for data visualization purposes using phpMyAdmin’s user management capabilities.

Dr. Kaylee P. Alexander is a graduate of the Department of Art, Art History & Visual Studies, where she was also a research assistant with the Duke Art, Law & Markets Initiative (DALMI). Her dissertation research focuses on the visual culture of the cemetery and the market for funerary monuments in nineteenth-century Paris. In the summer of 2020, she served as a Humanities Unbounded graduate assistant with the Center for Data and Visualization Sciences at Duke University Libraries.

Monthly Archives: April 2021

Relational Thinking: Database Re-Modeling for Humanists