HathiTrust Research Center Hosts April Virtual Workshops

March 20, 2024

The HathiTrust Research Center (HTRC) is hosting a virtual workshop series in April. The Research Center facilitates text and data mining uses of the HathiTrust corpus which contains over 18 million items digitized by partner libraries. HTRC tools and data range from off-the-shelf options to more advanced offerings for experienced scholars. Hosted by HTRC’s Janet Swatscheno and Ryan Dubnicek, the workshops will allow attendees to gain experience with tools and data from the HathiTrust Research Center (HTRC). 

The workshops will be held via Zoom and will include a mix of hands-on, discussion, and presentation. We will use breakout rooms to support hands-on activities. You will not be required to install any software to participate in the workshops. The workshops are open to HathiTrust members as well as non-members. Review workshop descriptions and register for the events using the links below. You may register for individual sessions or all three.

This particular series has been scheduled for 9:00 am ET to accommodate requests from HathiTrust’s European members. Any requests or questions regarding scheduling or workshops can be directed to htrc-help@hathitrust.org.


Workshop 1: Introduction to HathiTrust and HTRC

Date: Tuesday, April 16
Time: 9:00 am ET/ 8:00 am CT
Duration: 1.5 hours

Registration Form: https://forms.gle/b2Wr7VqNRtsGqBBK9

This workshop will introduce attendees to the data and computational tools of HathiTrust. HathiTrust operates a repository of over 18 million items digitized at a network of member libraries. This massive collection is available for computational analysis primarily through the tools and services of the HathiTrust Research Center (HTRC). Attendees of this workshop will be introduced to the HathiTrust Digital Library as well as the HTRC and its data and analytical tools, including hands-on practice with HTRC Analytics.

No experience is required for this introductory workshop.


Workshop 2: HTRC Extracted Features dataset

Date: Tuesday, April 23
Time: 9:00 am ET/ 8:00 am CT
Duration: 2 hours
Registration Form: https://forms.gle/b2Wr7VqNRtsGqBBK9

This session will introduce you to the Extracted Features data model and the kinds of research it enables. The Extracted Features dataset (v.2.0) includes 17+ million files, with each file representing a volume in the HathiTrust Digital Library. The Extracted Features files contain metadata about the volumes, as well as tokens (words), parts of speech, and their per-page counts. The dataset can be used for text analysis projects where access to the words and word-counts in a volume are expected by the algorithm, such as topic modeling or certain kinds of machine learning projects. This session will include a hands-on activity using the dataset.

Recommended prerequisites: Either Introduction to HathiTrust and HTRC workshop, or some previous experience with HathiTrust or HTRC.


Workshop 3: Introduction to HTRC Data Capsules

Date: Tuesday, April 30
Time: 9:00 am ET/ 8:00 am CT
Duration: 2 hours
Registration Form: https://forms.gle/b2Wr7VqNRtsGqBBK9

An introduction to the HTRC Data Capsules environment and how it can be used by intermediate and advanced researchers. This session will include a hands-on activity using an HTRC Data Capsule, Jupyter notebooks and Python code.

No experience is required–your level of participation is up to you! However, familiarity with Python is helpful.


 

Top