Quantitative Text Analysis 2E, Essex 2014
Quantitative Text Analysis 2E
Essex Summer School 2014
Instructor: Prof Kenneth Benoit, LSE
TA: Dr. Paul Nulty, LSE
Day 1: Quantitative text analysis overview and fundamentals
- slides
- demonstration .R code using US inaugural address speeches (downloaded from https://archive.org/details/Inaugural-Address-Corpus-1789-2009)
- exercise 1 instructions and starter code
- Texts you will need for **exercise 1 – **on the S: drive
Day 2: The Elements of Textual Data
- slides
- Zipf’s law demo code in Stata; in R (you need your own output .csv files)
- exercise 2 instructions and starter code
- exercise 2 solution
Day 3: Descriptive Statistical Methods for Texts
- slides
- exercise 3 instructions and starter code
- exercise 3 solution
Day 4: Quantitative methods for comparing texts
- slides
- exercise 4 instructions
- exercise 4 solution
Day 5: Automated Dictionary-Based Approaches
- slides
- exercise 5 instructions
- exercise 5 solution
Day 6: Document classifiers
- slides
- exercise 6 instructions
- exercise 6 solution
Day 7: Supervised Models for Scaling Texts
- slides
- additional slides from Tom Mitchell’s Machine Learning textbook
- exercise 7 instructions
- exercise 7 solution
Day 8: Unsupervised scaling models for text
- slides
- exercise 8 instructions
- exercise 8 solution
Day 9: Topic Models
- slides
- exercise 9 instructions
- exercise 9 solution
Day 10: Working with Big Text Data: Twitter
exercise 10 instructions
exercise 10 solution
You might also be interested in a large corpus of 1.6 million english tweets available from http://cs.stanford.edu/people/alecmgo/trainingandtestdata.zip (see the general web page at http://help.sentiment140.com/for-students), if you want to play with a large dataset.