After Integrating Digital Papyrology [Baumann, Cayless, Sosin]
Duke University recently completed a multi-year project called Integrating Digital Papyrology, or IDP, under generous support of the Andrew W. Mellon Foundation. Under IDP we in effect did three things. One, we united the the Duke Databank of Documentary Papyri or DDbDP (a repository of editions of papyrological documents), the Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens, or HGV (a database of scholarly information about those documents: date, findspot, bibliography, etc.), the Advanced Papyrological Information System, or APIS, (a repository of digital images and catalog records of papyri held in institutional collections), and the Bibliographie Papyrologique, or BP (a quarterly scientific bibliography of the discipline); all under a common technical standard (EpiDoc, TEI XML), and all searchable under a fairly intuitive interface. Two, we built a platform for open and transparent, peer-reviewed, version-controlled, community-based, scholarly curation of all of these disciplinary resources. None of them is a black box any longer. Three, we published all of our data and code under open licenses. We give it away.
Since IDP’s launch in 2010, more than 600 users have registered as contributors. Something like 6000 new texts have been entered. Tens of thousands of discrete edits have been committed. We have added Coptic. We are working toward adding Demotic Egyptian and Arabic. The DDbDP now has an editorial board and it contains a dozen international papyrologists, some of whom are even grad students (and they do amazing jobs). Grad students at Michigan and Berkeley are publishing directly to the DDbDP emendations of editions made by some of the world’s best papyrologists. Undergrads at BYU have organized to translate the largest archive of papyrological documents. One of our most active contributors is a retired high-school teacher, who has found a sort of second intellectual life, entering texts in the DDbDP. Other projects, like Pelagios, are leveraging the geo data that we expose. The Trismegistos project has built an onomastic and prosopographic project around our data and even a preliminary and completely new browse interface to the DDbDP. Parts of the underlying code are being repurposed under a new project by the Perseus Digital Library, and we are working with other non-Classics projects that may do the same. It is still early days, but in some important ways, we have been successful.
So, we thought we’d offer a few lessons that we learned along the way, some of them the hard way, but all of them, we hope, useful to others.
You can’t do it all yourself. Digital Classics is rarely going to be a one-person show. Mastering both the scholarly and the technical (itself often scholarly) sides of a project is usually going to be more than you can handle alone. You need help. Finding that help is the trick. Don’t assume that you can just get funding, find a “tech person”, have them do the digital bits while you do the scholarship, any more than you’d expect to rent an archaeologist, or a historian, or a Latinist for a project.
Be patient. It takes time to build up a team of collaborators. It will take longer than you expect to do any project of significance. If you go out for funding, you might not be successful the first time. Don’t give up, but do solicit and pay attention to criticism. The NEH funds “startup grants” of up to $60,000 to get projects off the ground. I’ve applied to this program both successfully and unsuccessfully, and I’ve sat on a review panel for it as well. I’ve seen virtually the same proposal be rejected one year, then accepted the next. I’ve also seen good proposals torpedoed by one bad review(er) more than once. Sometimes a proposal will have fixable problems and the reviewers’ comments will help you fix them.
Programming is not magic and programmers are not wizards. Holding the technical aspects of a project at arms’ length is a recipe for failure. People may do this because they feel technically incompetent and unable to understand what the programmers do, or because they think the programmers are merely implementing their ideas. Developers are smart enough to understand what you want and you are smart enough to understand what they do. But you need to talk to each other a lot.
Done does not mean “done”. Unlike some traditional scholarly activities (monograph and article publishing, for example), digital projects can have a long development life beyond ‘publication.’ and there’s no set point where you hand them over to someone else’s care. That doesn’t mean you can never stop working on them, but it does mean you should think about how to leave them so that someone else may use them later. Your data might be useful to someone else; your code might be reusable; the ways in which you solved problems (or failed to) might inspire others. The goal is to publish your work so that this is possible. Think of yourself as a stonemason working on part of a cathedral (a project that will go on for generations) rather than a painter producing a picture.
Set your work free. Academics are sometimes hesitant to give away digital efforts. There is a whole thriving ecosystem dedicated to the responsible management of copyright for digital artifacts, code, etc. One of the main players in this ecosystem is called Creative Commons. It provides licenses you can attach to digital content that make it clear to potential users what they can and cannot do with your stuff. If there is no predetermined end to your project, it becomes even more important that it can outlive you (or your interest). Open, non-restrictive licenses like CC-Attribution can enable that. Avoid the more restrictive licenses like the NonCommercial variants.
Be a techno-realist, not a techno-utopian. Digital methods and tools are probably not going to change the essential character of your teaching and research (not in the short term, anyway). They can: make certain tasks go much faster, allow you to reach more people, and allow you to present your work in new ways. They won’t: make you smarter, do the intellectual lifting for you, or erase ambiguity and complexity. Hard tasks will probably still be hard.
The environment isn’t static. The technology landscape in which your project operates isn’t fixed. It will change, in ways both good and bad. You need to be aware of this, and especially of the fact that you can help improve the landscape yourself. Standards like the TEI, for example, can be changed to suit you if you run into a problem that it doesn’t address, or doesn’t address well. Projects that were done, may need to be re-done or made better; systems that worked may stop working if you upgrade components; ideas that wouldn’t have worked in years past may be achievable now, all because of changes in technology.
The environment does, and should, constrain you. You will have heard the maxim “Don’t reinvent the wheel”. This is facile and may or may not be actually true in your specific case, but it does contain a useful point, namely that you should avoid (where you can) inventing new things. Because any new thing is going to have to plug into a complex environment–an environment that includes not just software, but standards, users, developers, institutions, and other projects–it’s going to have to adapt and evolve to fit into that environment. You aren’t just going to get it right the first time and move on. This means developing any really new thing is usually going to be much harder and take much longer than you expect, and it will also carry much greater risk. This is not at all to say “don’t invent”, but rather be strategic about it, and use available standards, software, and methods whenever you can.
Dream large; build small. Gall’s Law states: “A complex system that works is invariably found to have evolved from a simple system that worked. The inverse proposition also appears to be true: A complex system designed from scratch never works and cannot be made to work. You have to start over, beginning with a working simple system.” Small, individually useful pieces make for better systems and are useful outside the context in which they were built. These pieces don’t have to be code. They can be methods, theories, or documents (text files, images, etc.) too.
In DH, the theory is the substance. The design is the substance. The substance is code and data. And vice-versa. There is no partition between them. We think about things in terms of a graph because we have a graph. We think about things in terms of big data because we have big data. Theory determines the very things you are even able to think about and discuss.
This claim is not new or unique to DH. There are distinct parallels in physics, philosophy of science, and other disciplines. Philosophy of science similarly uses physics as its benchmark for “reality” or a “serious discipline”.
DH is definitionally the intersection of “Digital” and “Humanities”. So tools are theory. Data is theory. But the humanities have their own theories as well. There can be friction when these theories and models don’t match. That’s expected. But that doesn’t mean we have to resolve it by allowing one theory to prevail. I believe digital humanities can be richer when we resolve this friction by revising theories. Moreover, this process is what allows digital humanities to exist as a thing with its own theories. Not a suspension or emulsion of theories, but a solution. By such an approach it can also “progress” by evolutionary means; the mixing, intercombination, and offspring of various theories in contention with one another.
Various arguments have been made over whether DH represents a “colonization of the humanities by computer science” or the reverse, but it is best when it is neither; or rather: both – an integration.
Thus, this model of DH is recognizable in the ‘maxims’ expressed already:
-”Don’t assume you can just get funding, find a “tech person”, have them do the digital bits and you just do the scholarship”. If the tech determines what you can think about and express, and the tech person is unfamiliar with the sort of humanistic inquiry you’re interested in, this is perilous.
-”You should start by doing what you can by yourself anyway.” This goes for both those who identify more with the “digital” or the “humanities”, and even those who claim neutral ground. It exposes you to unfamiliar theory and trying to mash it together with your pre-existing theory will force you to confront your own assumptions.
-”the developers are smart enough to understand what you want and you are smart enough to understand what they do.” At the start, however, neither of you may understand either. Communicating your mental model of what you want to achieve, and what you believe is achievable, is laden with theory and assumptions about the nature of things. Agreeing on what you want and what you want to do can be even harder. But there is no solution that does not grow from prolonged, careful conversation.
-”The environment does, and should, constrain you. […] This means developing any really new thing is usually going to be much harder and take much longer than you expect, and it will also carry much greater risk.” Exactly.
I conclude with a small selection of quotes which I hope are illustrative of some of my perhaps more contentious points on the nature of tools and theories:
“… minds unduly fascinated by computers carefully confine themselves to asking only the kind of question that computers can answer….” Lewis Mumford, “The Sky Line “Mother Jacobs Home Remedies”,” The New Yorker, December 1, 1962, p. 148.
“The tools we employ at once extend and limit our ability to conceive the world.” David Hestenes, Oersted Medal Lecture 2002: Reforming the Mathematical Language of Physics
“Whether you can observe a thing or not depends on the theory which you use. It is the theory which decides what can be observed.” – Albert Einstein
“I tried to bring home the same point to a group of physics students in Vienna by beginning a lecture with the following instructions: “Take pencil and paper; carefully observe, and write down what you have observed!” They asked, of course, what I wanted them to observe. Clearly the instruction, “Observe!” is absurd.” – Karl Popper
We want to close with a few words about recent developments at Duke, which we hope offer one way forward for research in Digital Classics and beyond. It’s not a silver bullet. But we think it heads in the right direction. As of July first 2013, the three of us comprise a research group, embedded in Duke University Libraries, called the Duke Collaboratory for Classics Computing, or DC3, named after the plane, which was built without a prototype, called a loose collection of parts flying in formation, and whose longevity was owing to its simple and flexible design. Our mission is threefold: (1) stewardship of the technical and social structures that we built under IDP, and (2) R&D with a view to building standards, services, and tools that support a wide range of scholarly activities aimed first at classicists, but developed with a view to much wider applicability, and (3) teaching and outreach around these activities. We maintain independent research lives, but we work together; we spec together, we build together, we present together, we aim to publish together, starting next year we will even teach together. We collaborate closely with partners at Duke, in the Library and outside it, and with other Classics and non-Classics projects around the world. I now have a joint appointment in the library (the first such at Duke), where Ryan and Hugh reside full time. Start-up costs are eased with the generous support of the Mellon Foundation, but beyond an initial proving period, we are hard-funded, hard-wired.
This is an expensive proposition, but many good things are, and the returns are worth it. This unique arrangement means that we are freed from the grant cycle and from the hierarchical model of faculty innovators and librarian service providers. We can play around with things on our own and as soon as they start to get messy turn to each other for help. We have time to let ideas grow, start to mature, crash and burn disastrously, rise from the ashes again…or not. We have the freedom, in other words, to fail and the time to learn from failure. We are in each other’s hair, which means there’s no magic or awe, just the respect that is born of regular, prolonged, close, collaborative contact with smart people, and lots of talking. We dream plenty (coffee, cookies, and 6 whiteboards are good for that), but then we DO; we live in the real world. We’ve built an environment that supports thinking and developing at the productive intersection of different mental models. And because we are permanent, we are able to evolve with the broader DH environment, to drive change where we can and must, to take the strategic long view about what to invest our time in, to take the time to find the sweet spot where our different views of the world complement each other rather than clash.
This doesn’t mean that every digital classics project must envisage an operation like ours! But all ought to think and design and build with a view to the future, with the strength of multiple diverse skills and interests, with a goal of meeting real scholarly needs, with a spirit of bold experimentation, and with a clear call for institutions to furnish the support, understanding, and incentives, that make all of this possible. This is a tall order, but this period now is our 19th century; this period now is the time for us to lay foundations for what and how a thriving digital classics ecosystem should be. And the DC3 aims to be a sort of living experiment in trying to do just that.