The Bibliotheca Alexandrina Internet Archive is a backup system for the Internet Archive originally created in San Francisco. The content of web pages from 1996 to present have a record in the IA and the Bibliotheca Alexandrina Internet Archive has a record of web pages from 1996 to 2007. The BA also has information from the Middle East and Africa.
The IA harvests it’s information with the Heritrix which is an open-source , extensible, web-scale, archival quality web crawler project. Heritrix only collects material available by HTTP/HTTPS, DNS, and FTP. Heritrix is available for free downloadable use by other users. The other technology that the IA uses is the Wayback Machine. The Wayback Machine allows users to search through archived web sites. Storing the Archive’s collections involves parsing, indexing, and physically encoding the data. With the Internet collections growing at exponential rates, this task poses an ongoing challenge. The IA stores their increasing information with hardware that consists of PCs with clusters of IDE hard drives. Data is stored on DLT tape and hard drives in various appropriate formats, depending on the collection. Web data is received and stored in archive format of 100-megabyte ARC files made up of many individual files. A great fear is that the data that is stored could somehow be destroyed. One of the ways the IA is preventing accidents that could erase the stored data is by having multiple copies of the same information. Since technology is changing so quickly the IA is developing emulators so future researchers will be able to access and use the information stored in the archives.
The Bibliotheca Alexandrina Internet Archive is a mirror to the policies, collection methods and goals that the IA states in its main US domain. However, at the BA Internet Archive the information is stored using the petabox. The petabox is a new machine designed to safely store and process one million gigabytes of data. The machine features low power consumption, support for multiple operating systems, easy maintenance and software to automate mirroring.
The archive at the Bibliotheca Alexandrina includes 70 billion WebPages covering the period 1996–2007, 2000 hours of Egyptian and US television broadcasts, 1,000 archival films and 25,000 digitized books acquired through the Open Content Alliance consortium. It is capable of storing 3.7 petabytes of data on 1636 computers. BA is one of the leading libraries and archives outside of the US and is effectively collecting content from the Middle East and Africa. As of last year the Library of Alexandria reached 10 petabytes of information and the founders of the Archive hope to continue documenting cultural content through its archiving process.
Bibliotheca Alexandrina http://www.bibalex.org/Home/Default_EN.aspx