World Library  
Flag as Inappropriate
Email this Article

Newspaper digitization

Article Id: WHEBN0047726814
Reproduction Date:

Title: Newspaper digitization  
Author: World Heritage Encyclopedia
Language: English
Subject: Newspapers, Digitizing, Digital libraries
Publisher: World Heritage Encyclopedia

Newspaper digitization

The process of converting old newspapers which survive in analog form into digital images can be called newspaper digitization. The most common analog forms for old newspapers are paper and microfilm. Digitized images of newspaper pages are typically (though not always) analyzed with OCR software in order to produce text files of the newspaper content. Newspaper digitization is a special case of digitization in general.

Newspapers preserve a rich record of the past, and since the advent of digital media, many institutions across the world have began to digitize them and make the digital files publicly available. However, over 90% of newspapers remained unscanned in 2015.[1] Digitized newspapers may be made available for free or for a fee. Several lists (noted below) try to catalog digitized newspapers worldwide.

Successful newspaper scanning is a complex activity. Although scanning from paper is possible, microfilm scanning is cheaper and good microfilm has been called “the single most critical factor in the success of newspaper digitization.”[2] The OCR analysis of scanned pages presents a number of technical challenges and the text of old newspapers is often difficult to read, which introduces errors and complicates searching. Attaching metadata to images to make them more easily findable is another important step. Finally, search interfaces must be developed. A number of companies specialize in newspaper scanning and some produce software specially designed for the process.

The cost of storing printed newspapers and the relatively low demand for originals after microfilming and scanning means that printed newspapers, once microfilmed or scanned, have often been thrown out. Some people feel that this is a loss for researchers, or simply that there is a poignancy when the paper reading experience disappears. Author Nicholson Baker went so far as to create a paper newspaper archive, which he called the American Newspaper Repository, in order to preserve paper newspapers that would otherwise be discarded.

More recent newspapers may have been "born digital," meaning that they were printed from computer files rather than by letterpress or phototypesetting. They can be archived by storing the publisher's digital files of each page image rather than scanning the pages.

Finding aids and metasearch engines

  • Worldwide , maintained at WorldHeritage.
  • Worldwide list of newspaper digitization projects at the Center for Research Libraries, International Coalition on Newspapers.
  •, a website that provides a free metasearch service for several large collections (mostly Australian and American).


  1. ^ "Center for Research Libraries, "The state of the art: a comparative analysis of newspaper digitization to date", 10 April 2015." (PDF). 
  2. ^ , University of Illinois at Urbana-Champaign"Best Practices for Creating Digital Collections"Best Practices for Newspaper Digitization, chapter 4 in . 

See also

External links

  • Kenning Arlitsch and John Herbert, "Microfilm, paper, and OCR: issues in newspaper digitization" Microform & Imaging Review, 33, 2 (2003): 59-67. (Early review of newspaper digitization.)
  • Edwin Klijn, "The current state-of-art in newspaper digitization: a market perspective" D-Lib Magazine, 14, 1-2 (January-February 2008).
  • Center for Research Libraries, "The state of the art: a comparative analysis of newspaper digitization to date", 10 April 2015.
This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.

Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.