HathiTrust Research Center Hosts Oct/Nov Virtual Workshops

October 8, 2024

HathiTrust Research Center (HTRC) is hosting a virtual workshop series. The series will take place over three weeks on Monday October 21st, October 28th, and November 4th, all at 3:00pm Eastern.

The Research Center facilitates text and data mining uses of the HathiTrust corpus which contains over 18 million items digitized by partner libraries. HTRC tools and data range from off-the-shelf options to more advanced offerings for experienced scholars. Hosted by HTRC’s Janet Swatscheno, Ryan Dubnicek, and Jenny Christie, the workshops will allow attendees to gain experience with tools and data from HTRC.

The workshops will be held via Zoom and will include a mix of hands-on, discussion, and presentation. We will use breakout rooms to support hands-on activities. You will not be required to install any software to participate in the workshops. The workshops are open to HathiTrust members as well as non-members. Review workshop descriptions and register for the events using the links below. You may register for individual sessions or all three.

Any requests or questions regarding scheduling or workshops can be directed to htrc-help@hathitrust.org.


Workshop Descriptions

Workshop 1: Introduction to HathiTrust and HTRC

Date: Monday, October 21st, 2024
Time: 3:00 – 4:30 PM Eastern
Duration: 1.5 hours
Registration form: https://forms.gle/dZ6k3ydSmEonqciu5

This workshop will introduce attendees to the data and computational tools of HathiTrust. HathiTrust operates a repository of over 18 million items digitized at a network of partner libraries. This massive collection is available for computational analysis primarily through the tools and services of the HathiTrust Research Center (HTRC). Attendees of this workshop will be introduced to the HathiTrust Digital Library as well as the HTRC and its data and analytical tools, including hands-on practice with HTRC Analytics.

No experience is required for this introductory workshop.

Workshop 2: HTRC Derived Datasets

Date: Monday, October 28th, 2024
Time: 3:00 – 4:30 PM Eastern
Duration: 1.5 hours
Registration form: https://forms.gle/dZ6k3ydSmEonqciu5 

This session will introduce you to HTRC’s derived datasets, how they can be used and for which types of research methods they are suitable. After each dataset is introduced, we’ll get hands-on working with Python code in Google Colab notebooks, where we’ll use the Extracted Features 2.0 and BookNLP for English-Language Fiction datasets to conduct exploratory data analysis and visualization.

Note: working in Google Colab notebooks requires a Google account 

Recommended prerequisites: Either Introduction to HathiTrust and HTRC workshop, or some previous experience with HathiTrust or HTRC.


Workshop 3: HTRC Extracted Features API

Date: Monday, November 4th, 2024
Time: 3:00 – 4:30 PM Eastern
Duration: 1.5 hours
Registration form: https://forms.gle/dZ6k3ydSmEonqciu5 

This session will introduce you to the Extracted Features API (beta). The Extracted Features API extends volume-level, page-level  access to our Extracted Features (EF) dataset. In this workshop, participants will learn more about the API and get hands-on experience with the basic API calls necessary to work with the data.

Note: working in Google Colab notebooks requires a Google account.

Recommended prerequisites: We recommend that participants take the HTRC Derived Datasets workshop.

Top