Are you interested in using natural language processing or text analysis in your research? R is one of the most recommended languages for TA/NLP, partly because of an ecosystem of libraries designed to tackle common tasks such as corpus creation, cleaning and preprocessing, modeling, analysis, presenting, and exporting. In this workshop, we will compare some of these options (tm and tidytext for R, NLTK and Spacy for Python, etc.) before exploring quanteda, an R package for managing and analyzing textual data.
Quanteda is designed for R users needing to apply natural language processing to texts, from documents to final analysis. Its capabilities match or exceed those provided in many end-user software applications, many of which are expensive and not open source. The package is therefore of great benefit to researchers, students, and other analysts with fewer financial resources. While using quanteda requires R programming knowledge, its API is designed to enable powerful, efficient analysis with a minimum of steps. By emphasizing consistent design, furthermore, quanteda lowers the barriers to learning and using NLP and quantitative text analysis even for proficient R programmers.
Pre-Requisites: Basic familiarity with R. We will be focusing on teaching the fundamentals of text analysis and the Quanteda package, rather than introductory R.