Quantitative Text Analysis 2F, Essex 2013
Quantitative Text Analysis 2F
Essex Summer School 2013
Instructor: Prof Kenneth Benoit, LSE
TAs: Dr. Paul Nulty, LSE; Petra Martina Baumann, U. Salzburg
Day 1:Introduction and Issues in Quantitative Text Analysis
- slides
- exercise 1
- Texts you will need for exercise 1 – on the S: drive
Day 2:Textual Data, Units of Analysis, Definitions of Features
- slides
- demonstration of quanteda corpus creation and term-document matrix creation (in R)
- Zipf’s law demo code in Stata; in R(you need your own output .csv files)
- exercise 2
Day 3:Research Strategies in Quantitative Text Analysis
- slides
- exercise 3– the dictionary file (.lic) you will need is on the S: drive
Day 4:Quantitative methods for comparing texts
Day 5:Quantitative Content Analysis
Day 6: Automated Dictionary-Based Approaches
Day 7:Document classification and introduction to machine learning
- slides
- additional slidesfromTom Mitchell’s Machine Learning textbook
- exercise 7
Day 8: Non-parametric scaling models for text
Day 9: Parametric Models for Text Scaling
Day 10:Working with Big Text Data: Twitter
- slides part 1
- slides part 2
- exercise 10. This requires the R script phones.R but you will also find a hyperlink to this in the assignment instructions. The two files loaded directly from the website in phones.R are ip.Rdatafor the iPhone-related Tweetsand an.Rdatafor those related to Android phones.
- You might also be interested in a large corpus of 1.6 million english tweets available from http://cs.stanford.edu/people/alecmgo/trainingandtestdata.zip (see the general web page at http://help.sentiment140.com/for-students), if you want to play with a large dataset.