In this workshop, you will get a sense of what is possible working with humanities data and understand how humanities scholars approach “data”. We will introduce multiple scenarios with different datasets to help you develop strategies for organizing and cleaning data. Tools may include OpenRefine, Google Sheets, and R with RStudio. Attendees should plan on some prework, including installations and brief background readings.
Note that the workshop will take place on two different days. A reminder will be sent out one day in advance of each workshop in the series. You need only register for the first day; all registrations will be automatically transferred across the series. Please bring a laptop with you.
Schedule:
Day 1: Friday, October 14, 9-12pm, Lamont B-30
We will learn about approaches to organizing data in spreadsheets, as well as some of the most useful formulas and tools in Google Sheets. This session will help you avoid common organizational pitfalls and help you get set up to take advantage of time-saving automation with minimal effort. We'll alaso tackle using regular expressions and OpenRefine to clean up data that isn't well-formatted or consistent.
Day 2: Friday, October 21, 9-12pm, Lamont B-30
Take data clean-up and normalization to the next level and learn how to combine data sources using R, an open source programming language used widely for data analysis. We'll also learn how to use application programming interfaces (APIs) within R. APIs power the modern web by allowing developers to combine functionality from different sources and researchers to access data programmatically.