Ten Million and Counting

January 6, 2012

HathiTrust reached a major milestone on January 5, 2012, exceeding 10 million volumes in its digital collections. More than 2.7 million of these volumes are in the public domain, with viewing and downloading options available online. Statistics about the collections and a graph charting growth over time are available below (see also Statistics and Visualizations). We have also prepared a timeline noting significant events on our way to 10 million volumes. As of January 5, 2012, 23 of HathiTrust’s 67 partners are depositing content in the repository. Details on contributions by institution can be found in our monthly updates. See also our News and Publications page for press releases, papers, presentations, and more about HathiTrust over the last several years.

 

Copyright Distribution by Type

Copyright Distribution by Date

Copyright Distribution by Date

 

Volume Distribution by Date

Volume Distribution by Date

 

Volume Distribution by Language (1)

Volume Distribution by Language (1)

 

Volume Distribution by Language (2)

Volume Distribution by Language (2)

Growth Over Time

Timeline

January 2008

  • First formal multi-institutional commitments made to building HathiTrust

March 2008

  • First instance of HathiTrust repository infrastructure in place in Ann Arbor, Michigan
  • Storage purchased for second instance of repository in Indianapolis
  • University of Michigan coordinates site visit by a team from DRAMBORA
    • Results of the DRAMBORA review were published as

Seamus Ross, Andrew McHugh, Perla Innocenti, Raivo Ruusalepp: Investigation of the potential application of the DRAMBORA toolkit in the context of digital libraries to support the assessment of the repository aspects of digital libraries, Glasgow: DELOS NoE, August 2008, ISBN: 2-912335-41-8

April 2008

  • Loading and testing of Google-digitized content from the University of Wisconsin begins
  • Preparations begin to establish second instance of repository in Indianapolis

May 2008

  • Testing of Lucene/Solr begins to provide full-text search across the repository
  • PageTurner application released with specialized accessible interface, allowing reading and full-text searching of individual volumes in the repository

June 2008

  • Lucene/Solr installed on development and production servers
  • Collection Builder application released

July 2008

August 2008

  • HathiTrust “about” website is released, including information about HathiTrust compliance with criteria for Trustworthy Digital Repositories (TRAC) and other documentation
  • Benchmarking for full-text search indexing begins

September 2008

  • Plans initiated to enable distributed development of applications and services by partner institutions
    • 3-prong strategy: to enable access to the PageTurner via an API, to create a development ‘sandbox’ for shared development, and to develop a public discovery interface for the repository

October 2008

  • HathiTrust formally launched, including the institutions of the CIC, the University of California system, and the University of Virginia
  • Storage installed at Indiana site and an additional 90 TB of storage is installed at both instances, bringing capacity at each site to 190TB
  • Public beta full-text search application released, allowing full-text search of 500,000 volumes

November 2008

  • Data synchronization between Michigan and Indiana sites is completed and routinized

December 2008

  • Agreement concluded with OCLC to create discovery interface for HathiTrust
  • Indiana site becomes fully operational mirror of storage at Michigan site

January 2009

  • Load testing for full-text search begins

February 2009

  • Work begins on temporary beta catalog interface for HathiTrust

March 2009

  • Redundancy (in Indiana) for Web hosting infrastructure and full-text search indexing is established
  • Sample datasets containing full-text OCR of repository volumes are made available to researchers
  • New storage purchased, bringing total capacity at each site to 320TB

April 2009

  • Temporary beta catalog released
  • Ingest of Google-digitized content from Indiana University and the University of California begins

May 2009

  • HathiTrust Research Center and Collaborative Development Environment working groups launched
    • The groups are charged to develop specifications for a HathiTrust Research Center and establish collaborative development environment for HathiTrust repository, respectively
  • Alpha version of Data API released
  • Michigan ingests legacy digital collections into the repository to pilot non-Google ingest

June 2009

  • California Digital Library begins work on improvements to PageTurner application
  • A record 379,000 volumes are ingested in June

July 2009

  • Working group formed to investigate need for 3rd instance of storage

August 2009

September 2009

  • University of Michigan Press opens access to backfile publications in HathiTrust
  • UM and CDL staff begin collaboration for ingest of Internet Archive-digitized materials
  • Michigan staff contribute common-grams code to Solr code base

October 2009

  • Ingest of content begins from Penn State
  • Ingest of content begins from UC Santa Cruz and UC San Diego
  • A record 553,963 volumes are ingested in October

November 2009

December 2009

  • Columbia University joins HathiTrust
  • Center for Research Libraries begins audit of HathiTrust for compliance with TRAC
  • HathiTrust Bibliographic API released
  • HathiTrust begins work to implement Shibboleth
  • Redundancy of search index established at Indiana site

January 2010

  • Executive Committee approves new pricing model for HathiTrust
  • Storage Working Group submits final report to Executive Committee

February 2010

  • Sample of IA-digitized volumes from UC ingested for testing
  • Ingest of Google-digitized volumes begins from the University of Minnesota
  • Full-text search index exceeds Solr/Lucene’s limit of 2.1 billion unique terms

March 2010

  • UM staff receive samples of locally-digitized materials from several CIC institutions (Iowa, Illinois, Northwestern) to begin working on scalable mechanisms and processes for ingesting locally-digitized content
  • OCLC begins loading records for HathiTrust volumes into WorldCat

April 2010

  • Ingest begins of an initial set of nearly 100,000 IA-digitized volumes from the University of California

May 2010

  • New York Public Library joins HathiTrust
  • HathiTrust passes 6 million total volumes and 1 million volumes in the public domain
  • Executive Committee launches Communications Working Group

June 2010

  • HathiTrust enables authentication via Shibboleth
    • In the short-run this allows partners to download full-PDFs of all public domain materials in the repository and use the Collections application through a local sign-on. Implementation of Shibboleth paves the way for future partner services, such as expanded access to in-copyright materials.
  • Full-text search index is mirrored at Indiana site

July 2010

August 2010

  • Princeton University Library joins HathiTrust
  • Ingest of Google- and Internet Archive-digitized volumes from Columbia University begins
  • HathiTrust adds 160 new TB of storage bringing total capacity at each site to 475 TB
  • October 31 deadline announced for joining HathiTrust to participate in “constitutional convention” of partners in 2011

September 2010

  • The Triangle Research Libraries Network and Dartmouth College join HathiTrust
  • Ingest of content begins from New York Public Library and the University of Illinois

October 2010

  • HathiTrust announces the 52 partners that will take part in 2011 Constitutional Convention
    • Newly announced partners include:
      • Baylor University
      • Emory University
      • Harvard University Library
      • Johns Hopkins University
      • Library of Congress
      • Massachusetts Institute of Technology
      • New York University
      • Stanford University Library
      • Texas A&M University
      • Universidad Complutense de Madrid
      • University of Maryland
      • University of Pennsylvania
      • University of Pittsburgh
      • University of Utah
      • University of Washington
      • Utah State University
  • Image ingest pilot begins
    • The University of Minnesota, Minnesota Historical Society, and Minnesota Digital Library begin working with staff at Michigan to develop a prototype workflow for depositing images and associated metadata into the HathiTrust system for access, storage, and preservation purposes. Read more about the project.
  • California Digital Library begins work on a new bibliographic data management system for HathiTrust
  • Discovery Interface Working Group charges Full-text Search sub-group
  • Ingest begins of content from Princeton University and the University of Chicago
  • Collaborative Development Environment is released, used actively for development, testing, and release of code for HathiTrust systems

November 2010

  • Ingest from Cornell University begins

December 2010

  • Policy and specifications framework for ingest of locally-digitized materials is finalized
  • HathiTrust begins working with CIC institutions on ingest of locally-digitized content

January 2011

  • OCLC releases WorldCat Local prototype catalog for HathiTrust
  • HathiTrust ingests nearly 60,000 images and associated metadata from the University of Minnesota and partners
  • HathiTrust adds support for rights holders to open access to works with Creative Commons licenses

February 2011

  • HathiTrust makes datasets of public domain materials available on a large scale

March 2011

  • HathiTrust certified by the Center for Research Libraries as a Trustworthy Digital Repository
  • Ingest from the Library of Congress begins
  • HathiTrust signs agreement with ProQuest to make the HathiTrust full-text index available via Serials Solutions’ Summon service
  • Executive Committee launches User Support Working Group

April 2011

  • HathiTrust releases new viewing functionality in PageTurner application
  • Ingest from Harvard University begins
  • HathiTrust concludes first storage replacement cycle, replacing storage purchased in 2007
  • Planning begins for the HathiTrust Constitutional Convention

May 2011

  • HathiTrust begins investigation to identify orphan works in HathiTrust
  • Ingest of content from University of Virginia begins

June 2011

  • Boston University and Lafayette College join HathiTrust
  • UM announces plans to provide access to orphan works to partner institutions
  • The HathiTrust Research Center is launched, led by Indiana University and the University of Illinois
  • HathiTrust begins ingest of materials digitized by Yale University Library
  • “Perspectives on HathiTrust” blog is launched, with inaugural post on HathiTrust and Discovery by John Wilkin

July 2011

  • The University of Notre Dame and University of Florida join HathiTrust
  • 3-year review of HathiTrust is posted on the HathiTrust website and distributed to partners
    • The 3-year review was prepared by Ithaka S+R with oversight by the Strategic Advisory Board in advance of the Constitutional Convention to lay the groundwork for discussions about HathiTrust’s future. View the 3-year review and the Constitutional Convention information page.
  • HathiTrust posts the first set of orphan candidate works
  • HathiTrust releases improvements to the Collections application interface and full-text search
    • Improvements to full-text search include the 2 highest priorities from a full-text search features analysis prepared by the Full-text Search Working Group: the incorporation of bibliographic metadata into the full-text index to allow faceting of results by bibliographic data and improved search results ranking.
  • First version of partner print holdings database released
  • The HathiTrust Research Center receives a $600,000 grant from the Sloan Foundation to investigate “non-consumptive” research
    • The term “non-consumptive” was first used in the proposed Google Settlement to refer to computational research performed on in-copyright works In relation to in-copyright works, “non-consumptive” research in such a way that significant reading or “consumption” of the works does not occur.

August 2011

  • University of Connecticut joins HathiTrust
  • Cornell, Duke, Johns Hopkins, Emory University, and the University of California system announce participation in the Orphan Works Project
    • View information about the terms of access proposed to orphan works. See also the Orphans Works Project page on the University of Michigan Library website. Note: No orphan works are currently available in HathiTrust (as of January 6, 2012).
  • Proposal to establish print monographs archive distributed to partners
  • HathiTrust releases mobile interfaces for catalog and PageTurner applications
  • HathiTrust begins ingest of rare books and incunabula digitized by Universidad Complutense de Madrid
  • HathiTrust begins working with the University of Pittsburgh and University of Utah on ingest of locally-digitized materials
  • HathiTrust begins ingest of Utah State University Press backfile publications, to be made available in HathiTrust on an open access basis
  • HathiTrust begins ingest of Google-digitized volumes from Northwestern University and Purdue University, and Internet Archive-digitized volumes from North Carolina State University
  • HathiTrust concludes agreements with OCLC and EBSCO to make the HathiTrust full-text index available via their discovery services

September 2011

  • The University of Connecticut and University of Missouri join HathiTrust
  • HathiTrust, Google, and Duke University Press sign agreement to open access to DUP backfile volumes in HathiTrust under Creative Commons licenses
  • The Authors Guild and others file a lawsuit against HathiTrust alleging copyright infringement
  • HathiTrust begins working with the University of Florida and the University of North Carolina-Chapel Hill on ingest of locally-digitized materials
  • Partners submit final ballot proposals for the Constitutional Convention. 7 are submitted in all.

October 2011

November 2011

  • Boston College joins HathiTrust
  • The University of California begins offering reprints of UC-digitized public domain materials via HathiTrust
  • The User Experience Advisory Group releases HathiTrust User Personas

January 2011

  • HathiTrust reaches 10 million volumes

February 2014

  • HathiTrust reaches 11 million volumes
Top