Web Scraping (Part 2 of 2) - Harvard University Digital Scholarship Group

← Return to Events

This two-day workshop teaches participants how to automate the extraction of data from websites and other online repositories into a well-formatted, locally stored dataset, for later analysis. Web scraping tools make the process of collecting large amounts of online information more efficient, and help automate an otherwise tedious, time-consuming, and error prone process.

The workshop includes an introduction to web structures and provides direct, hands-on experience with a series of scraping techniques that run the gamut from simple to complex, including tools for batch file downloading, a full workflow using browser extensions only, and advanced HTML and DOM parsing techniques using Python.

This workshop is in person, 9 am-12 pm on Friday, April 5th, and Friday, April 19th, in Lamont Library Room B-30.