Shield logo for Digital Scholarship Group

Digital
Scholarship
Group

[Postponed] Text Analysis in R with Quanteda

Postponed: this event has been postponed due to university guidelines on the coronavirus. We will be rescheduling the talk for later in the semester, in a format yet to be determined (virtual or in-person).

Are you interested in using natural language processing or text analysis in your research? R is one of the most recommended languages for TA/NLP, partly because of an ecosystem of libraries designed to tackle common tasks such as corpus creation, cleaning and preprocessing, modeling, analysis, presenting, and exporting. In this workshop, we will compare some of these options (tm and tidytext for R, NLTK and Spacy for Python, etc) before exploring quanteda, an R package for managing and analyzing textual data.

Quanteda is designed for R users needing to apply natural language processing to texts, from documents to final analysis. Its capabilities match or exceed those provided in many end-user software applications, many of which are expensive and not open source. The package is therefore of great benefit to researchers, students, and other analysts with fewer financial resources. While using quanteda requires R programming knowledge, its API is designed to enable powerful, efficient analysis with a minimum of steps. By emphasizing consistent design, furthermore, quanteda lowers the barriers to learning and using NLP and quantitative text analysis even for proficient R programmers.

The workshop will be paced for those with basic familiarity with R. We will be focusing on teaching the fundamentals of text analysis and the Quanteda package, rather than introductory R.

Registration (required but free) is available on the Harvard Training Portal.