One of the questions we keep returning to throughout the course is whether digital history has actually changed the questions, problems, and opportunities facing historians or whether these are simply the same issues masquerading as novel ones. In “Confronting the Digital,” Tim Hitchcock makes the case for significant change.
According to Hitchcock, historians have not responded effectively to the fact that, “Algorithm-driven discovery and misleading forms of search, poor OCR, and all the selection biases of a new edition of the Western print archive have changed how we research the past, and the underlying character of the object of study (inherited text).” One of the problems is that poor quality of most OCR means that, “While we think we are searching newspapers, we are actually searching markedly inaccurate representations of text, hidden behind a poor quality image.” Although it is true that there have always been problems of searching in archives, Hitchcock is right to point this out because while historians have been trained to use archives, they have generally not been trained to be critical users of internet searches. Most people have not. We have a tendency to assume that internet searches return accurate results because most of us do not really know how they work. Even though I read a book on PageRank in college, I still did not know, as Hitchcock points out, “that Google does not actually count results, although it appears to do so. Instead, it estimates the numbers of hits on the basis of the speed with which it locates the first few instances.”
Hitchcock also argues for more transparency from historians about their search processes and use of online materials. As he explains, “I have yet to see a piece of academic history that is explicit about its reliance on keyword search and electronic sources. As editors and authors, we accept and write footnotes that misrepresent the research process. As teachers, we largely fail to instil a critical engagement with the real electronic text we ask our students to read, and instead encourage them to pretend they are sitting with a dusty book in their hands.” While I don’t really see more transparency catching on in academic texts in the near future, especially given how clunky the first attempts would likely end up being, I do see more room for transparency in education. Yes, as an undergraduate, I sat through many workshops with librarians on how to use the library catalog, but we very rarely addressed Google and almost always in the context of Google Scholar. I also don’t recall explicit instructions from my history professors about how to use online searching to find sources, especially primary sources, yet I am sure they have plenty of personal experience. So I agree with Hitchcock in this respect. When it comes to doing history, I don’t trust the smattering of search tactics that I have picked up trying to find a birthday gift or somewhere to eat; I wish I had been trained to search in the context of historical scholarship, rather than being left to my own devices as a “digital native.” Historians have not been trained in online searching to the extent that they have always been trained to use libraries and archives, so it is important that historians stop obscuring their use of these searches so that we can open up a necessary dialogue.
Photo: Thomas Jefferson’s Library exhibition at the Library of Congress. Jefferson has his own classification scheme, but the image nevertheless reminds me of a quote from Hitchcock: “Embedded within the Dewey Decimal and Library of Congress systems of classification (and in all their less successful imitators) are clear disciplinary boundaries which constrain how a reader imagines their topic and the intellectual landscape through which they navigate. Academic history was built on this lattice-work of understanding and has traditionally been constrained within it – if we are experts in anything, it is in how to use a library and archive. The advent of keyword searching lets us escape this post-Enlightenment knowledge system, but it also removes the framework of source criticism and classification that we have come to rely upon, and which we silently assume is shared with our readers, when we reference a specific edition or archive.”