Sunday, November 3, 2013

Our newspaper problem

When I wrote my sabbatical application, I included a methodology section which carefully listed all the primary sources I hoped to use at each research site. I am told that the thoroughness of my planning was impressive. But now I will let everyone in on a little secret: keeping my fingers crossed, hoping no one would notice, I quietly omitted a substantial group of records from my proposal:


Scholars who concern themselves mainly with books and scholarly journals may not appreciate the bibliographic "wild west" that local newspapers represent. Engineers may rely on Compendex, and Psychologists may trust PsycINFO, but there is no comparable database that contains "most" old newspapers. Today, companies like Gannett and Tribune Company own dozens of media outlets. Yet, most small-town papers have been, and continue to be, locally owned. Thus, it is not easy for database aggregators to identify, license, and digitize their content. Hometown publishers themselves sometimes provide an online archives ... more often, they do not. In other words, the odds of finding decades of a newspaper available online are less than random.

Being a faculty member at Penn State, I am very fortunate to have access to the Newspaper Archive. For the communities I am researching, this database provides access to the Bedford Gazette (1899-2012); the Connellsville Courier (1879-1977); the Gettysburg Adams Sentinel (1811-1942), Compiler (1857-1950), Gazette (1803-2004), Republican Banner (1833-2012), and Times (1909-2013); the Huntingdon Daily News (1922-2012); the Lebanon Daily News (1872-1970); the New Castle News (1891-2013); the Warren Ledger (1864-1895); and the Williamsport Gazette and Bulletin (1869-1955) and Sun-Gazette (1929-1973). In addition, I can obtain the Harrisburg Patriot (1854-1922), the Philadelphia Inquirer (1860-1922); and the Wilkes-Barre Times-Leader (1892-1922) through another database, America's Historical Newspapers. This said, I have not yet located online backfiles for many other Pennsylvania cities, including Erie and Scranton. 

Even for localities which appear to be "covered" by a long run in AHN, Newspaper Archive, or another web site, a question few seem to be asking is whether the title which is available digitally was considered to be the "paper of record" in its era. Would most of the town news have been published in it? How accurate and reliable was its reporting? Was it widely-read? Many communities published more than one newspaper, sometimes representing different political views, addressing various readerships, or capturing "morning" versus "late-breaking" events. Yet convenience, more than anything else, seems to govern which titles have been digitized. To take one of Pennsylvania's major cities as an example, a site called purports to provide an "Erie Newspaper Archives." Many amateurs might find it sufficient to "tell the story of [their] ancestor's lives as they lived it and watch [their] family history unfold as never before," as the site promises. Looking at the fine print, however, may be of limited usefulness for professional historians because neither of Erie's most important papers, the Dispatch or the Times, are included there.

Despite the fact that I'd prefer to have access to the Warren Mail, I have been using Newspaper Archive's version of the Ledger, which is the only online title that covers the Warren Library Association's early years (1870s-1880s). Hours of hands-on practice has prompted additional concerns about digitized newspapers. It can be a interesting scene when the peculiarities of 19th-century news authoring confront the limitations of today's technology. For example, since fonts and space were limited, editors of yesteryear placed articles wherever they would fit. They frequently published columns consisting of dozens of "newslets," 1-2 sentence blurbs. As shown in the screenshot below, each newslet was a discrete story. I have found them invaluable for establishing the dates of minor events, such as an upcoming library fundraiser (like Warren's "Library Comedy Club"). However, since such small notices lack headlines, they are seldom indexed individually. The only way to access them is through a full-text search of the entire newspaper. Thus, researchers like me must wade through thousands of irrelevant articles and advertisements to find them.

"Newslets," such as the ones published in the first column of the November 29, 1878 issue of the Warren Ledger, were often published in 19th and early 20th century newspapers. They are seldom individually indexed in online databases.
Another difficulty is that full-text searching relies on optical character recognition software that converts scanned images to machine-readable text. No human eyes have made sense of abbreviations, hyphenations, spelling mistakes, typographical irregularities, or flyspeck that are all-too-common in old newspapers. To put it another way, Newspaper Archive is searching transcripts of articles, and those renditions often contain substantial and numerous errors. This screenshot illustrates the messiness of OCR-generated content:

Screenshot of a results list from Newspaper Archive. Note several errors in the extracted text from the January 16, 1891 Warren Ledger article (for example, "OfiHcera Keport" should read "Officers Report").

Thus appalled by the eccentric title choices and faulty transcriptions which seem epidemic in historic newspaper databases, I reverted to a method that no one under 50 uses anymore:

The microfilm "hand-search."

That's right: start at January 1st, and doggedly read every article, on every page, of every issue, of reel after reel, year after year.

I decided to test this "old school" method with Clarion, because neither the Clarion Free Library nor the Clarion County Historical Society holds significant scrapbooks or vertical file material concerning CFL's early history. Also, neither of Clarion's two major papers, the Democrat and the Republican, appear to be available online. Both titles are on microfilm at the State Library of Pennsylvania, just a $1.75 bus ride from my house.

Gratefully, SLP now has ScanPro digital readers, which all but eliminate the motion sickness I used to experience when using older equipment. Unfortunately, however, the new technology doesn't reduce the time needed to complete a hand-search. After 3 days, I'd only looked at Republican issues from 1920 to 1926 and 1929 to 1930. In other words, on average it took me 1 1/2 to hand-search a single year of an 8-page weekly -- longer if you factor in requests for microfilm reels from storage, the machine's occasional breakdowns, walks to and from the State Library's circulation desk to retrieve printouts, and a few restroom breaks. This translates to about 100 hours required to complete a 50-year run ... of a weekly (nevermind a daily!). Due to strain on my eyes and back, I can only realistically sit at the reader for 5 hours per day. In addition, SLP is only open to researchers 3 days per week. So, calculating everything on a scrap of paper, I figure that hand-searching the Clarion Republican from 1900 to 1950 could be a 2 month-long effort! And then, to be politically balanced, I suppose I'd have to load up the Democrat after that! Yikes. Given the number of libraries I am researching for this project, hand-searches of local newspapers are completely unfeasible.

Thus my initial instinct was correct, not to commit my sabbatical to searching newspapers. Yet I can't help but wonder how much valuable information I am missing. Will my resulting publications will be as definitive as they could be? What to do?

I need to find a strategy for this "newspaper problem."

  1. Hi Bernadette, I'm Rhonda Clark and I teach in the library science dept. at Clarion. I've developed a course in local history/genealogy reference and also one in representing local collections, so the newspaper issue is near and dear to my heart. Plus I'm in the Titusville Historical Society and am sick over the fact that it is so expensive for some local libraries to provide a subscription to Newspaper Archive to their patrons, and, if they do, that the results, as you point out, are often dismal due to ineffective OCR for these fonts and blurry microfilm originals. I hope that Erie Public continues to hold out on vendor "let us microfilm and you will have perpetual use" offers, thus losing copyright. But it does make it difficult for the researcher! I have long argued for an indexing project for the Titusville Herald, but such projects are not funded anymore. Everything, we are told, is digitized, so why bother with an index? And that is so frustrating. I can perhaps recommend a systematic approach to partial use of newspapers. I did my Ph.D. dissertation (UMN 1996) on Russian women publishers 1860-1905 and had three main bodies of info for my 160 women - their censor files, their personal writings, and their sixty or so periodicals. One hundred sixty women, two cities (Moscow, St. Pete), four major libraries/manuscript repositories, five or more archives and only five months abroad. Well, you get the idea - not possible. So I devised a method to survey the periodicals for specific information relevant to the women as publishers (evidence of active editorial work in formatting, content, letters to the readers, etc.). Some women edited large, illustrated periodicals for decades, so I would do a sample of those years. Since major revisions to format also showed in in the censor apps, this was a bit easier than looking for random info in a paper. BUT, doing a sample of years might provide you a general sense of which papers are worth investing more time into and provide you a scholarly rationale for doing so. I think you are right that you cannot be exhaustive, but you can think of some rationales for choosing particular newspapers and particular segments, even if it just provides some "color" for the writing. Good luck! Sounds wonderful. Oh, and you've likely found it, but a former student of mine, Adam Blahut (in my previous incarnation as a history professor at Mercyhurst) wrote an MA thesis at Edinboro (2005) on the Erie Public Library (I have not read it, so cannot comment on contents), as well as his undergrad thesis at Mercyhurst, which was an overview of the founding of the library. It appeared in the Journal of Erie Studies. And there is an earlier study of the Erie public library in that journal as well, as I recall from years ago.