# Quantitative Text Analysis 2F, Essex 2013

**Quantitative Text Analysis 2F**

Essex Summer School 2013

**Instructor:**
Prof Kenneth Benoit, LSE

**TAs:**
Dr. Paul Nulty, LSE;
Petra Martina Baumann, U. Salzburg

**Day 1:Introduction and Issues in Quantitative Text Analysis**

- slides
- exercise 1
**Texts**you will need for**exercise 1 –**on the S: drive

**Day 2:Textual Data, Units of Analysis, Definitions of Features**

- slides
- demonstration of quanteda corpus creation and term-document matrix creation (in R)
- Zipf’s law demo code in Stata; in R(you need your own output .csv files)
- exercise 2

**Day 3:Research Strategies in Quantitative Text Analysis**

- slides
- exercise 3– the dictionary file (.lic) you will need is on the S: drive

**Day 4:Quantitative methods for comparing texts**

**Day 5:Quantitative Content Analysis**

**Day 6: Automated Dictionary-Based Approaches**

**Day 7:Document classification and introduction to machine learning**

- slides
- additional slidesfromTom Mitchell’s Machine Learning textbook
- exercise 7

**Day 8: Non-parametric scaling models for text**

**Day 9: Parametric Models for Text Scaling**

**Day 10:Working with Big Text Data: Twitter**

- slides part 1
- slides part 2
- exercise 10. This requires the R script phones.R but you will also find a hyperlink to this in the assignment instructions. The two files loaded directly from the website in phones.R are ip.Rdatafor the iPhone-related Tweetsand an.Rdatafor those related to Android phones.
- You might also be interested in a large corpus of 1.6 million english tweets available from http://cs.stanford.edu/people/alecmgo/trainingandtestdata.zip (see the general web page at http://help.sentiment140.com/for-students), if you want to play with a large dataset.