Ten Million and Counting
January 6, 2012
HathiTrust reached a major milestone on January 5, 2012, exceeding 10 million volumes in its digital collections. More than 2.7 million of these volumes are in the public domain, with viewing and downloading options available online. Statistics about the collections and a graph charting growth over time are available below (see also Statistics and Visualizations). We have also prepared a timeline noting significant events on our way to 10 million volumes. As of January 5, 2012, 23 of HathiTrust’s 67 partners are depositing content in the repository. Details on contributions by institution can be found in our monthly updates. See also our News and Publications page for press releases, papers, presentations, and more about HathiTrust over the last several years.
Copyright Distribution by Type
Copyright Distribution by Date
Volume Distribution by Date
Volume Distribution by Language (1)
Volume Distribution by Language (2)
Growth Over Time
Timeline
January 2008
- First formal multi-institutional commitments made to building HathiTrust
March 2008
- First instance of HathiTrust repository infrastructure in place in Ann Arbor, Michigan
- Storage purchased for second instance of repository in Indianapolis
- University of Michigan coordinates site visit by a team from DRAMBORA
- Results of the DRAMBORA review were published as
Seamus Ross, Andrew McHugh, Perla Innocenti, Raivo Ruusalepp: Investigation of the potential application of the DRAMBORA toolkit in the context of digital libraries to support the assessment of the repository aspects of digital libraries, Glasgow: DELOS NoE, August 2008, ISBN: 2-912335-41-8
April 2008
- Loading and testing of Google-digitized content from the University of Wisconsin begins
- Preparations begin to establish second instance of repository in Indianapolis
May 2008
- Testing of Lucene/Solr begins to provide full-text search across the repository
- PageTurner application released with specialized accessible interface, allowing reading and full-text searching of individual volumes in the repository
June 2008
- Lucene/Solr installed on development and production servers
- Collection Builder application released
July 2008
- Ingest of content begins from the University of Wisconsin
- Tab-delimited metadata files are made available to facilitate local loading of HathiTrust bibliographic records
- Read more about HathiTrust Data Availability and APIs
August 2008
- HathiTrust “about” website is released, including information about HathiTrust compliance with criteria for Trustworthy Digital Repositories (TRAC) and other documentation
- Benchmarking for full-text search indexing begins
September 2008
- Plans initiated to enable distributed development of applications and services by partner institutions
- 3-prong strategy: to enable access to the PageTurner via an API, to create a development ‘sandbox’ for shared development, and to develop a public discovery interface for the repository
October 2008
- HathiTrust formally launched, including the institutions of the CIC, the University of California system, and the University of Virginia
- Storage installed at Indiana site and an additional 90 TB of storage is installed at both instances, bringing capacity at each site to 190TB
- Public beta full-text search application released, allowing full-text search of 500,000 volumes
November 2008
- Data synchronization between Michigan and Indiana sites is completed and routinized
December 2008
- Agreement concluded with OCLC to create discovery interface for HathiTrust
- Indiana site becomes fully operational mirror of storage at Michigan site
January 2009
- Load testing for full-text search begins
February 2009
- Work begins on temporary beta catalog interface for HathiTrust
March 2009
- Redundancy (in Indiana) for Web hosting infrastructure and full-text search indexing is established
- Sample datasets containing full-text OCR of repository volumes are made available to researchers
- New storage purchased, bringing total capacity at each site to 320TB
April 2009
- Temporary beta catalog released
- Ingest of Google-digitized content from Indiana University and the University of California begins
May 2009
- HathiTrust Research Center and Collaborative Development Environment working groups launched
- The groups are charged to develop specifications for a HathiTrust Research Center and establish collaborative development environment for HathiTrust repository, respectively
- Alpha version of Data API released
- Michigan ingests legacy digital collections into the repository to pilot non-Google ingest
June 2009
- California Digital Library begins work on improvements to PageTurner application
- A record 379,000 volumes are ingested in June
July 2009
- Working group formed to investigate need for 3rd instance of storage
August 2009
- Report released on HathiTrust Disaster preparedness
- HathiTrust releases METS profile version 1.0
September 2009
- University of Michigan Press opens access to backfile publications in HathiTrust
- UM and CDL staff begin collaboration for ingest of Internet Archive-digitized materials
- Michigan staff contribute common-grams code to Solr code base
October 2009
- Ingest of content begins from Penn State
- Ingest of content begins from UC Santa Cruz and UC San Diego
- A record 553,963 volumes are ingested in October
November 2009
- Full-text search released (across 4.6 million volumes)
- See the Full-text Search Blog
December 2009
- Columbia University joins HathiTrust
- Center for Research Libraries begins audit of HathiTrust for compliance with TRAC
- See the HathiTrust TRAC documentation for information and results.
- HathiTrust Bibliographic API released
- HathiTrust begins work to implement Shibboleth
- View information about Shibboleth in HathiTrust
- Redundancy of search index established at Indiana site
January 2010
- Executive Committee approves new pricing model for HathiTrust
- The new model allows participation of institution that do not have large amounts of digital content to contribute. View the new pricing model FAQ.
- Storage Working Group submits final report to Executive Committee
February 2010
- Sample of IA-digitized volumes from UC ingested for testing
- Ingest of Google-digitized volumes begins from the University of Minnesota
- Full-text search index exceeds Solr/Lucene’s limit of 2.1 billion unique terms
- Lucene core developer Michael McCandless creates patch allowing up to 274 billion. View the full-text search blog post.
March 2010
- UM staff receive samples of locally-digitized materials from several CIC institutions (Iowa, Illinois, Northwestern) to begin working on scalable mechanisms and processes for ingesting locally-digitized content
- OCLC begins loading records for HathiTrust volumes into WorldCat
April 2010
- Ingest begins of an initial set of nearly 100,000 IA-digitized volumes from the University of California
May 2010
- New York Public Library joins HathiTrust
- HathiTrust passes 6 million total volumes and 1 million volumes in the public domain
- Executive Committee launches Communications Working Group
June 2010
- HathiTrust enables authentication via Shibboleth
- In the short-run this allows partners to download full-PDFs of all public domain materials in the repository and use the Collections application through a local sign-on. Implementation of Shibboleth paves the way for future partner services, such as expanded access to in-copyright materials.
- Full-text search index is mirrored at Indiana site
July 2010
- Yale University Library joins HathiTrust
- Strategic Advisory Board launches Collections Committee
- Executive Committee launches User Experience Advisory Group
- Collection-building functionality integrated into full-text search
August 2010
- Princeton University Library joins HathiTrust
- Ingest of Google- and Internet Archive-digitized volumes from Columbia University begins
- HathiTrust adds 160 new TB of storage bringing total capacity at each site to 475 TB
- October 31 deadline announced for joining HathiTrust to participate in “constitutional convention” of partners in 2011
September 2010
- The Triangle Research Libraries Network and Dartmouth College join HathiTrust
- Ingest of content begins from New York Public Library and the University of Illinois
October 2010
- HathiTrust announces the 52 partners that will take part in 2011 Constitutional Convention
- Newly announced partners include:
- Baylor University
- Emory University
- Harvard University Library
- Johns Hopkins University
- Library of Congress
- Massachusetts Institute of Technology
- New York University
- Stanford University Library
- Texas A&M University
- Universidad Complutense de Madrid
- University of Maryland
- University of Pennsylvania
- University of Pittsburgh
- University of Utah
- University of Washington
- Utah State University
- Newly announced partners include:
- Image ingest pilot begins
- The University of Minnesota, Minnesota Historical Society, and Minnesota Digital Library begin working with staff at Michigan to develop a prototype workflow for depositing images and associated metadata into the HathiTrust system for access, storage, and preservation purposes. Read more about the project.
- California Digital Library begins work on a new bibliographic data management system for HathiTrust
- Discovery Interface Working Group charges Full-text Search sub-group
- Ingest begins of content from Princeton University and the University of Chicago
- Collaborative Development Environment is released, used actively for development, testing, and release of code for HathiTrust systems
November 2010
- Ingest from Cornell University begins
December 2010
- Policy and specifications framework for ingest of locally-digitized materials is finalized
- HathiTrust begins working with CIC institutions on ingest of locally-digitized content
January 2011
- OCLC releases WorldCat Local prototype catalog for HathiTrust
- HathiTrust ingests nearly 60,000 images and associated metadata from the University of Minnesota and partners
- HathiTrust adds support for rights holders to open access to works with Creative Commons licenses
- The Brooklyn Museum, Society of American Archivists and many others are early adopters. View the rights holder Permissions Agreement.
February 2011
- HathiTrust makes datasets of public domain materials available on a large scale
- See HathiTrust Datasets for more information
March 2011
- HathiTrust certified by the Center for Research Libraries as a Trustworthy Digital Repository
- Ingest from the Library of Congress begins
- HathiTrust signs agreement with ProQuest to make the HathiTrust full-text index available via Serials Solutions’ Summon service
- Executive Committee launches User Support Working Group
April 2011
- HathiTrust releases new viewing functionality in PageTurner application
- See the Update on April 2011 Activities for details
- Ingest from Harvard University begins
- HathiTrust concludes first storage replacement cycle, replacing storage purchased in 2007
- Planning begins for the HathiTrust Constitutional Convention
May 2011
- HathiTrust begins investigation to identify orphan works in HathiTrust
- Ingest of content from University of Virginia begins
June 2011
- Boston University and Lafayette College join HathiTrust
- UM announces plans to provide access to orphan works to partner institutions
- The HathiTrust Research Center is launched, led by Indiana University and the University of Illinois
- HathiTrust begins ingest of materials digitized by Yale University Library
- “Perspectives on HathiTrust” blog is launched, with inaugural post on HathiTrust and Discovery by John Wilkin
July 2011
- The University of Notre Dame and University of Florida join HathiTrust
- 3-year review of HathiTrust is posted on the HathiTrust website and distributed to partners
- The 3-year review was prepared by Ithaka S+R with oversight by the Strategic Advisory Board in advance of the Constitutional Convention to lay the groundwork for discussions about HathiTrust’s future. View the 3-year review and the Constitutional Convention information page.
- HathiTrust posts the first set of orphan candidate works
- HathiTrust releases improvements to the Collections application interface and full-text search
- Improvements to full-text search include the 2 highest priorities from a full-text search features analysis prepared by the Full-text Search Working Group: the incorporation of bibliographic metadata into the full-text index to allow faceting of results by bibliographic data and improved search results ranking.
- First version of partner print holdings database released
- The holdings database is to act as the basis for the new pricing model, and expanded access to in-copyright materials for members of partner institutions. See the Update on July 2011 Activities for more information.
- The HathiTrust Research Center receives a $600,000 grant from the Sloan Foundation to investigate “non-consumptive” research
- The term “non-consumptive” was first used in the proposed Google Settlement to refer to computational research performed on in-copyright works In relation to in-copyright works, “non-consumptive” research in such a way that significant reading or “consumption” of the works does not occur.
August 2011
- University of Connecticut joins HathiTrust
- Cornell, Duke, Johns Hopkins, Emory University, and the University of California system announce participation in the Orphan Works Project
- View information about the terms of access proposed to orphan works. See also the Orphans Works Project page on the University of Michigan Library website. Note: No orphan works are currently available in HathiTrust (as of January 6, 2012).
- Proposal to establish print monographs archive distributed to partners
- The proposal is submitted by the Collections Committee for the Constitutional Convention. View the final accepted proposal and the Constitutional Convention information page.
- HathiTrust releases mobile interfaces for catalog and PageTurner applications
- HathiTrust begins ingest of rare books and incunabula digitized by Universidad Complutense de Madrid
- HathiTrust begins working with the University of Pittsburgh and University of Utah on ingest of locally-digitized materials
- HathiTrust begins ingest of Utah State University Press backfile publications, to be made available in HathiTrust on an open access basis
- HathiTrust begins ingest of Google-digitized volumes from Northwestern University and Purdue University, and Internet Archive-digitized volumes from North Carolina State University
- HathiTrust concludes agreements with OCLC and EBSCO to make the HathiTrust full-text index available via their discovery services
September 2011
- The University of Connecticut and University of Missouri join HathiTrust
- HathiTrust, Google, and Duke University Press sign agreement to open access to DUP backfile volumes in HathiTrust under Creative Commons licenses
- The Authors Guild and others file a lawsuit against HathiTrust alleging copyright infringement
- HathiTrust begins working with the University of Florida and the University of North Carolina-Chapel Hill on ingest of locally-digitized materials
- Partners submit final ballot proposals for the Constitutional Convention. 7 are submitted in all.
- View the proposals and the Constitutional Convention information page.
October 2011
- The University of Miami and University of Arizona join HathiTrust
- The Constitutional Convention takes place; 5 out of 7 ballot initiatives are passed
- View the blog post about the Convention, and the Constitutional Convention information page which includes notes from the Convention.
- Ingest of Internet Archive-digitized content begins from Duke University and University of North Carolina-Chapel Hill
November 2011
- Boston College joins HathiTrust
- The University of California begins offering reprints of UC-digitized public domain materials via HathiTrust
- The User Experience Advisory Group releases HathiTrust User Personas
January 2011
- HathiTrust reaches 10 million volumes
February 2014
- HathiTrust reaches 11 million volumes