World Library  
Flag as Inappropriate
Email this Article

Archive site

Article Id: WHEBN0002643012
Reproduction Date:

Title: Archive site  
Author: World Heritage Encyclopedia
Language: English
Subject: Internet censorship circumvention, Archive (disambiguation), Editing policy, Data management, Web archiving
Collection: Data Management, Online Archives, Web Archiving Initiatives
Publisher: World Heritage Encyclopedia

Archive site

In web archiving, an archive site is a website that stores information on webpages from the past for anyone to view.


  • Common techniques 1
  • Examples 2
    • Google Groups 2.1
    • Internet Archive 2.2
    • NBCUniversal Archives 2.3
    • Nextpoint 2.4
    • PANDORA Archive 2.5
    • 2.6
  • See also 3
  • References 4

Common techniques

Two common techniques for archiving web sites are using a web crawler or soliciting user submissions:

  1. Using a web crawler: By using a web crawler (e.g., the Internet Archive) the service will not depend on an active community for its content, and thereby can build a larger database faster. This can result in the community's growing larger as well. However, web crawlers are only able to index and archive information the public has chosen to post to the Internet, or that is available to be crawled, as web site developers and system administrators have the ability to block web crawlers from accessing [certain] web pages (using a robots.txt).
  2. User submissions: While it can be difficult to start user submissions services due to potentially low rates of user submission, this system can yield some of the best results. By crawling web pages one is only able to obtain the information the public has chosen to post online; however, potential content providers may not bother to post certain information, assuming no one would be interested in it, because they lack a proper venue in which to post it, or because of copyright concerns.[1] However, users who see someone wants their information may be more apt to submit it.


Google Groups

On February 12, 2001, Google acquired the usenet discussion group archives from and turned it into their Google Groups service [1]. They allow users to search old discussions with Google's search technology, while still allowing users to post to the mailing lists.

Internet Archive

The Internet Archive (Internet Archive official website) is building a compendium of websites and digital media. Starting in 1996, the Archive has been employing a web crawler to build up their database. It is one of the best known archive sites.

NBCUniversal Archives

NBCUniversal Archives offer access to exclusive content from NBCUniversal and its subsidiaries. Their NBCUniversal Archives official website provides easy viewing of past and recent news clips, and it is a prime example of a news archive.


Nextpoint offers an automated cloud-based, SaaS for marketing, compliance, and litigation related needs including electronic discovery.


PANDORA (Pandora Archive), founded in 1996 by the National Library of Australia, stands for Preserving and Accessing Networked Documentary Resources of Australia, which encapsulates their mission. They provide a long-term catalog of select online publications and web sites authored by Australians or that are of an Australian topic. They employ their PANDAS (PANDORA Digital Archiving System) when building their catalog. ( official website) is a large library of old text files maintained by Jason Scott Sadofsky. Its mission is to archive the old documents that had floated around the bulletin board systems (BBS) of his youth and to document other people's experiences on the bulletin board systems.

See also


  1. ^ Jinfang Niu, University of South Florida (March–April 2012). "An Overview of Web Archiving". D-Lib Magazine 18 (3/4).  
This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.

Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.