Large-Scale Text Analysis with R

Home/Large-Scale Text Analysis with R

Large-Scale Text Analysis with R

Instructors

Description

Text collections such as the Google Books have provided scholars in many fields with convenient access to their materials in digital form, but text analysis at the scale of millions or billions of words still requires the use of tools and methods that may initially seem complex or esoteric to researchers in the humanities. Large-Scale Text Analysis with R will provide a practical introduction to a range of text analysis tools and methods. The course will include units on data extraction, stylistic analysis, authorship attribution, genre detection, gender detection, unsupervised clustering, supervised classification, topic modeling, and sentiment analysis. The main computing environment for the course will be R, “the open source programming language and software environment for statistical computing and graphics.” While no programming experience is required, students should have basic computer skills and be familiar with their computer’s file system and comfortable with the command line. The course will cover best practices in data gathering and preparation, as well as addressing some of the theoretical questions that arise when employing a quantitative methodology for the study of literature. Participants will be given a “sample corpus” to use in class exercises, but some class time will be available for independent work and participants are encouraged to bring their own text corpora and research questions so they may apply their newly learned skills to projects of their own.