That is the question posed by Paul Duguid, a professor at UC Berkeley, the University of London and Santa Clara University, about the Google Books Project. His article, “Inheritance and loss? A brief survey of Google Books” was just published in First Monday, a peer-reviewed online journal about the Internet.
Duguid’s point is that the Google Books project will really outstrip most other projects to digitize cultural artifacts, making them “appear inept or inadequate.” But the authority and quality of the Google project, Duguid argues, is based on a kind of inheritance from the reputation of the libraries involved. So Duguid sets out to see if Google really is the qualitative heir of Harvard and Stanford.
His results are disheartening. His search for a deliberately unconventional book, Sterne’s “Tristram Shandy,” returns results likely to confuse and discourage a casual reader. The first result on Google’s results list, a copy from Harvard, is so badly scanned that it is virtually illegible, with words cut off by the gutter on nearly every line. Elsewhere the text fades to indecipherable scratchings. And some of Sterne’s eccentricities are missing; the black page of mourning for the dead Parson Yorick simply is not included in the Google scan. When Duguid tries the second result from his search, things get worse. The first page of the scan is blank and the second page puts the reader at the end of chapter 0ne and the beginning of chapter 2 — of the second volume. Nothing informs the reader (other than comparison with a printed text) that they have been plunged into the middle of the book.
Duguid’s judgments on Google Books are harsh: the project ignores essential metadata like volume numbers, the quality of the scans are often inadequate, and sometimes editions that are best consigned to oblivion are given undeserved prominence for no discernible reason (that is his conclusion regarding the second text he found, from Stanford). Rather than inheriting quality from Harvard and Stanford, he concludes, “Google threatens not only its own reputation for quality and technological sophistication, but also those of the institutions that have allied themselves to the project.”
It is true that the real value of the Google Books Project is not so much to find reading matter for people as to direct them to which books are most likely to be of help or interest to them. Few people, one presumes, will try to read “Tristram Shandy” in the Google Books format. But the failures of visual quality and metadata control threaten even the more modest view of Google Books as a giant index. Without a higher degree of quality than Duguid discovered, it is hard to argue that Google is superior in any way to a comprehensive online catalog from a major library
“It is true that the real value of the Google Books Project is not so much to find reading matter for people as to direct them to which books are most likely to be of help or interest to them.”
Kevin, I agree with you completely. Google is the access point for the content, but not the superior mode for viewing the content itself. This may change over time as readers may choose convenient access over complete and contextual content.
Google is becoming the Super Catalog, pulling the pieces together in one place, taking us beyond the citation and the table of contents, but for now we (libraries) still hold the content. For now…
I have just spend some time with Google Books in preparation for a class presentation, and I agree with Duguid regarding the clarity of the images. Many, even from recent works, were fuzzy and somewhat difficult to read. Google is definitely taking the “bigger is better” approach, as opposed to the Open Content Alliance, whose digital images are much crisper and who have provided useful help rather than “cool” features like Google’s “popular passages.” If 200 books include the same quotation from James Madison, does that mean I should put it in my paper, too?
A previous post noted, “Google is the access point for the content, but not the superior mode for viewing the content itself.” True enough, but surely the content itself affects the access. If there are large sections of text which are unreadible by ordinary viewers, then they are probably unreadable by Google’s search algorhythm– and therefore cannot be access points.