ME104 Linear Regression Analysis, 2012

ME104 Linear Regression Analysis

Professor Kenneth Benoit

London School of Economics and Political Science

Course Handout as pdf

Objectives and Learning Outcomes

This course focuses on building a greater understanding, theoretical underpinning, and tools for applying the linear regression model and its generalizations. With a practical focus, it explores the workings of multiple regression and problems that arise in applying it, as well as going deeper into the theory of inference underlying regression and most other statistical methods. The course also covers new classes of models for binary and count data, emphasizing the need to fit appropriate models to the underlying processes generating the data being explained.

This course is primarily about data analysis and developing a deeper understanding of the generalized linear model. The focus is on practice, and this focus is reflected in the choice of texts and in the emphasis on applied coursework. While this course deals to some degree with the generalized linear model on a mathematical and theoretical level, its main focus is practical, the ability to use the techniques when faced with the need in practical research. Consequently the learning method combines lectures and reading with hands-on statistical programming exercises using real datasets.

The learning outcomes associated with this nine-week course are aimed at students being able to:

  • Develop a deeper understanding of the linear regression model and its limitations;

  • Know how to diagnose and apply corrections to some problems with the generalized linear model found in real data; discussed;

  • Use and understand generalizations of the linear model to binary and count data;

  • Develop a greater familiarity with a range of techniques and methods through a diverse set of theoretical and applied readings;

  • Know where to go to learn more about the techniques in this class and those called for that were not covered in this class.


An introductory course in quantitative methods is a prerequisite for this course, since this course extends rather than introduces linear regression analysis.

The Stata statistical package is used for all exercises, and students in ME104 should be familiar or at least prepared to quickly learn this package.


Meetings. Classes will meet over ten days for a combination of lectures and lab sessions. Lectures will take place from 10:00–12:30 each day, with labs following from 13:30-15:00. The lab sessions will consist of guided exercises using Stata to implement practical data analysis.

Teaching assistant. The TA in 2012 is Carolina Plescia, Carolina will be leading the lab sessions.

Computer software. Stata 12 will be be used for all exercises.

Reading materials consist of:

  • Kennedy, Peter. 1998. A Guide to Econometrics. 5th ed. Oxford: Blackwell.This text provides a three-level discussion of each topic: first a general discussion, then a technical discussion, and then a very technical discussion. Most students find this quite useful since it permits them to dig as deep as their abilities let them or as their need allows.

  • Agresti, Alan and Barbara Finlay. 2009. Statistical Methods for the Social Sciences (4th Edition). Prentice Hall.Written by statisticians rather than econometricians, this text provides a very accessible treatment of the methods covered in this course in a traditional textbook format. For those who consider this too basic, I recommend Wonnacott and Wonnacott (below) as an alternative.

  • Wonnacott, Thomas H. and Ronald J. Wonaccott. 1990. Introductory Statistics. 5th Ed. New York: Wiley. This text is recommended, but not required, as a supplement or alternative to Agresti and Finlay if you should prefer W&W or find the A&F text too basic.

The home page for this course will be and you will be able to find this course handout there, links to exercises, and additional material.

Note that a great resource on Stata and other aspects of applied quantitative methodology can be found on the Department of Methodology’s YouTube channel,

Detailed Schedule

Day 1 –  The Error Term and Properties of Estimators

  • Required Reading:

    • Kennedy Ch 1–2, “Introduction” and “Criteria for Estimators"
    • A&F Review by reading Chs 4–5, “Probability Distributions” and “Statistical Inference: Estimation
  • Recommended Reading:

    • W&W Ch 7, “Point Estimation”
  • Homework: Exercise 1

Day 2 – The Classical Linear Regression Model

  • Required Reading:
    • Kennedy Ch 3 “Interval Estimation and Hypothesis Testing”
    • A&F Ch 19, “Multiple Regression and Correlation”
  • Recommended Reading:
    • W&W Chs 11–13, “Fitting a Line”, “Simple Regression”, and “Multiple Regression”
  • Homework: Exercise 2

Day 3 – Inference, Intervals, and Hypothesis Testing

  • Required Reading:
    • Kennedy Ch 3 “The Classical Linear Regression Model”
    • A&F 6, “Statistical Inference: Significance Tests”
  • Recommended Reading:
    • W&W 8, “Confidence Intervals” and Ch. 9 “Hypothesis Testing”
  • Homework:  Exercise 3

Day 4 – Diagnosing problems with the CLRM

  • Additional materials for lecture:  anscombe
  • Required Reading:
    • Kennedy Chs 5-6 “ Kennedy Chs 5–6, “Specification” and “Wrong Regressors, Nonlinearities, and Parameter Inconsistency”
    • A&F 14, “ 14, “Model Building with Multiple Regression”
  • Recommended Reading:
    • Berry, William D. and Stanley Feldman. 1993. “Multiple Regression in Practice.” In Regression Analysis, ed. Michael Lewis-Beck.
    • Fox, John. 1993. “Regression Diagnostics” In Regression Analysis, ed. Michael Lewis-Beck. London: Sage.
  • Homework:  Exercise 4

Day 5 – Problems with predictors

Day 6 – Problems with error assumptions

  • Additional materials for lecture:  dail2002.dta
  • Required Reading:
    • Kennedy Ch 7, “Non-zero expected disturbance”; Ch 8, “Non-spherical disturbances”; Ch 11, “Simultaneous Equations”
    • A&F Ch 16.6, “Structural Equation Models”
  • Recommended Reading:
  • Homework:  Exercise 6.

Day 7 – Models for binary data: Binary Logistic Regression

  • Required Reading:
    • Kennedy Ch 16, “Qualitative Dependent Variables”
    • A&F Ch 15.1–15.3, “Logisitic Regression”
  • Recommended Reading:
    • Aldrich, John H. and Forrest D. Nelson. 1984. Linear Probability, Logit, and Probit Models. Beverly Hills: Sage.
    • W&W Ch 18 “Maximum Likelihood Estimation”
  • Homework:  Exercise 7.

Day 8 –  Models for categorial and ordinal data

  • Required Reading:
    • Kennedy Ch 16.2-16.3
    • A&F Ch 15.4–15.6
  • Homework:  Exercise 8.

Day 9 –  Models for count data

  • Do file for lecture 9
  • Required Reading:
    • Kennedy Ch 16.4, “Count data”
    • A&F Ch 15.4–15.
  • Recommended Reading:
    • King, Gary. Unifying Political Methodology: The Likelihood Theory of Statistical Inference. Cambridge, England and New York: Cambridge University Press, 1989. Reprinted, Ann Arbor: University of Michigan Press, 1998. Chapter 5.7–5.10.
  • Homework:  Exercise 9.

Day 10 –  Estimating uncertainty in inferential models

  • Do file for lecture 10
  • Required Reading:
    • King, Gary; Michael Tomz; and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science 44(2, April): 341- 355.
  • Recommended Reading:
    • The Zelig package for R — this will allow you to implement the “CLARIFY” techniques in King, Tomz and Wittenberg (2000)
    • Kennedy Ch 21, “Robust Estimation”
    • King, Gary. 1986. “How Not to Lie with Statistics: Avoiding Common Mistakes in Quantitative Political Science.” American Journal of Political Science 30(3):666-687.
Ken Benoit
Ken Benoit
Professor of Computational Social Science
comments powered by Disqus