Large-Scale Text Analysis with R

Courses 2014 · MD

Instructors

Warning

/home/dhtraining/public_html/hilt/wp-content/themes/hilt-child/single-hilt_course.php

Mark Algee-Hewitt

Assistant Professor, Digital Humanities, Department of English Stanford University

Description

Text mining, the practice of using computational and statistical analysis on large collections of digitized text, is becoming an increasingly important way of extracting meaning from writing. This technique gives us information we could never access by simply reading the texts. But extracting this data can be a difficult task, both conceptually and methodologically: particularly for those who work in the humanities and who are also able to benefit the most from these methods. “Large-Scale Text Analysis with R” will provide an introduction to the methods of text mining using the open source software Environment “R”. In this course, we will explore the different methods through which text mining can be used to “read” text in new ways: including authorship attribution, sentiment analysis, cluster analysis and topic modeling. At the same time, our focus will also be on the analysis and interpretation of our results. How do we formulate research questions and hypothesis about text that can be answered quantitatively? Which methods fit particular needs best? And how can we use the numerical output of quantitative text analysis to explain features of the texts in ways that make sense to a wider audience? While no programming experience is required, students should have basic computer skills and be familiar with their computer’s file system. Participants will be given a “sample corpora” to use in class exercises, but some class time will be available for independent work and participants are encouraged to bring their own text corpora and research questions so they may apply their newly learned skills to projects of their own.

Course Software

R: http://www.r-project.org/

R Studio Desktop: http://www.rstudio.com/products/rstudio/download/

Course Schedule

Course Website

Location

2102 Benjamin Building