Preserving Scholarship in a Digital World

by Cara Bonnett

Consider: The coolest thing to be done with your data will likely be thought of by someone else.

digitalThat’s the idea driving Paolo Mangiafico to explore new methods for managing and archiving the deluge of digital information at Duke. Mangiafico, formerly on the staff of the Duke Libraries, is the University’s new director of digital information strategy. His mission: to make sure Duke’s vast and varied digital output—from course Web sites and dissertations to wikis and raw scientific data—will be available to future scholars to use in ways we can’t currently imagine.

“The academy is based on building on the work someone has done before you,” Mangiafico said. “We need to provide incentives for people to share data, help other people get to that data and mash it up, and make sure the stuff persists over time. Someone might not think to do those mash-ups until twenty years from now.”

Mangiafico’s efforts—one element of a new University initiative funded in part by a Mellon Foundation grant —are aimed at developing not just a digital attic, but a technological infrastructure and set of policies that will add value to researchers’ current work. The endeavor also stirs up some sticky issues, such as how to turn research data into “knowledge in the service of society” with greater efficiency and how to reward digital collaboration in an academic environment.

Because the traditional tenure system is still tied to print publication, researchers may feel especially protective of their data—despite the documented citation advantage of open access articles, said Kevin Smith, Duke’s scholarly communications officer. “There’s a mental roadblock: If I make the data available, will somebody else jump my claim, take my data and publish my article before I do,” said Smith, whose blog, http://library.duke.edu/blogs/scholcomm/, explores legal issues such as authors’ rights, copyright and fair use.

Despite these concerns, the open access movement has made progress nationally, with a 2008 mandate from the National Institutes for Health requiring scientists to submit finished papers to the PubMed Central database to allow public access. And there has been progress at Duke, too. This spring, the University instituted a requirement that all theses and dissertations be contributed to an open access repository.

But there is further to go, said Ricardo Pietrobon, associate vice chair of surgery at Duke and director of Research on Research, a collaborative effort to maximize research productivity and patient outcomes. According to Pietrobon, inefficient access and distribution systems in biomedical research, for example, can mean a ten-year gap between publication of a clinical trial and implementation in clinical practice. “The more we can streamline that conversion of information to practice, the faster we can improve patient care,” he said.

Even at Duke, where the community is interested in sharing, coordinating parallel campus efforts can still be challenging. Systems are already in place to manage and preserve vital University records, such as Board of Trustee minutes, payroll records and student transcripts. And the University Archives works closely with the Office of News and Communication, for example, to preserve all Duke press releases and new multimedia content such as podcasts and “Duke on Camera” video clips.

However, while a 2006 survey of 120 interdisciplinary centers and 50 academic departments and programs across Duke identified a handful of existing digital repositories (see sidebar below), there are no long-term plans for management and preservation of “born digital” data such as electronic course catalogs or department newsletters, University Archivist Tim Pyatt said. “What worries me is the stuff I don’t know about—keeping track of the new content that comes up that doesn’t have that paper equivalent,” Pyatt said. “Hundreds of us are trying to find these solutions independently. We need to be thinking about this together, so we’re not spending multiple resources to solve the same problem.”

That’s where Mangiafico, who led the Duke Libraries’ first digitization projects in the 1990s, comes in. As more materials take on new life in the digital world—from Duke’s famed ancient papyri collection to past issues of Duke’s yearbook, the Chanticleer,—he wants the Duke community to think more strategically about what is worth saving and, for what is saved, how those digital assets might be used in the future.

“It’s hard to decide what’s important in advance, but the tools and infrastructure we build now need to factor in the long term,” Mangiafico said. Only through that kind of forethought and coordination can the University facilitate the kind of data-driven “mash-ups” that will fuel the next generation of unexpected collaborations. Mangiafico predicts: “With enough eyeballs, you make better discoveries.”

About the Digital Information initiative

What: The initiative is a joint project of the Office of the Provost, the Duke University Libraries and the Office of Information Technology. It has been funded in part by a $325,000 grant from The Andrew W. Mellon Foundation, in partnership with Dartmouth College.

Who: Paolo Mangiafico, who was named Duke’s director of digital information strategy last fall, will support the provost’s new digital information steering committee, to be made up of faculty, archivists, information technology staff and representatives from other areas of the university.

Next steps: The committee will begin discussions this spring about goals, priorities, policies and potential pilot projects. Mangiafico also plans to assemble an informal group of information technology, library and other staff to share best practices and work toward common approaches.

Beyond Duke: Duke and Dartmouth share an advisory group to guide development of a digital information strategy that can serve as a model for other institutions. The advisory group, which comprises university information technology directors, library directors and vice provosts from Duke, Dartmouth, the University of Chicago, Princeton, Yale, University of Virginia and Williams, met for the first time in December and plans to meet again this fall.

Online repositories at Duke

Here are a few examples of existing campus repositories:

  • Duke Law Faculty Scholarship Repository: a full-text electronic archive of scholarly works by Duke Law faculty
  • Duke Student Portfolio: an electronic archive of undergraduate student work (text, audio and video files), managed by the College of Arts & Sciences
  • DukeSpace: a project of Duke Libraries and University Archives that provides access to electronic theses and dissertations, as well as selected University records
  • MedSpace: Duke Medicine Digital Repository, Medical Center Archives repository
  • Faculty Database System: an electronic collection that includes faculty directory information, as well as curricula vitae and research

Staying Ahead of the Digital Avalanche

Part detective, part digital archaeologist. That’s how electronic records archivist Seth Shaw sees himself. He excavates data from 3 1/2-inch floppy disks, digital camera memory cards, and hard drives circa 2000. A Nobel Laureate in economics even revealed his username and password to Shaw in order to donate his e-mail correspondence to the Duke Libraries.

And unlike the archivists of generations past who could set boxes of letters or old photographs aside for later cataloging, Shaw faces the twin ticking time bombs of technology obsolescence and “bit rot.” “When you stick papers in a box, you don’t have to go back and check every month to make sure they’re still there. You don’t assume the box is spontaneously going to die on you, like a hard drive might,” Shaw said.

Shaw is on the front lines of a new Duke initiative to preserve the “born digital” artifacts that might someday document the work of a future Nobel Laureate or help tell the story of campus life in 2009 when the University marks its 100th anniversary in 2024. His job highlights the uncertainty inherent in trying to ensure that the University’s most precious digital resources are preserved and usable beyond the short lifespan of current technologies.

In an effort to capture the first rough draft of Duke’s current history, for example, Shaw wrote his own computer program to copy seven years’ worth of multimedia files off servers and hard drives in the Office of News and Communication. He left with 211 gigabytes’ worth of University news and the knowledge that each passing day generates a new flood of digital data he can’t possibly hope to sift through, let alone store.

Meanwhile, Shaw also faces growing skittishness among prospective donors, who fear what one Duke student called the “promiscuous access” of online data sharing. “It’s one thing to have a box of papers on the shelf in the library that someone might pull down,” Shaw said. “It’s a whole different story if you donate your papers and someone could type your name into Google and find your files.”

In an era when the first draft of scholarship is written in wikis and blogs and researchers can store an entire career on a Flash drive, Shaw sometimes feels as if he’s trying to outrun an avalanche. “We’re trying to make preservation decisions in a foggy crystal ball,” Shaw said. “There is no future-proofing. We’ll always be trying to keep up with technology.”

Cara BonnettCara Bonnett is Managing Editor, News & Information, for Duke’s Office of Information Technology.