Tidying Data with Google Sheets, OpenRefine, and Python (Remote Delivery) - Harvard University Digital Scholarship Group

[This workshop will be delivered via Zoom due to the COVID-19 pandemic]

In his paper "Tidy Data," Hadley Wickham riffs on Tolstoy: "Like families, tidy datasets are all alike but every messy dataset is messy in its own way." When we spend 75% of our "analysis" time cleaning and preprocessing data, it makes sense to focus on strategies to standardize our data. In this workshop, we will focus on correcting common errors in collected data and (re)structuring datasets to facilitate analysis.

We will be using Google Sheets, OpenRefine, and Python (Pandas) for these tasks. While you don't need to be a Pythonista, some familiarity with Python or other similar scripting languages will be helpful, as we won't be spending much time on syntax.

Registration is free but required; please register via the Harvard Training Portal.