Angrist & Pischke, Mostly Harmless Econometrics
Chapter 3: Making Regression Make Sense
한국어Core Message
Regression is useful because it provides the best linear approximation to the Conditional Expectation Function (CEF). The question of when regression is causal depends on the Conditional Independence Assumption (CIA).
3.1 Regression Fundamentals
3.1.1 The Conditional Expectation Function (CEF)
The CEF is the expected value of Yi given Xi:
Example: The CEF of log wages given schooling shows that people with more education earn more on average (~10% per year).
The Law of Iterated Expectations
An unconditional expectation equals the expectation of the CEF:
Three Key Properties of the CEF
Property 1: CEF Decomposition
where:
- εi is mean-independent of Xi: E[εi | Xi] = 0
- εi is uncorrelated with any function of Xi
→ Any random variable can be decomposed into a part "explained by X" (the CEF) and an orthogonal residual.
Property 2: CEF Prediction
→ The CEF is the Minimum Mean Squared Error (MMSE) predictor of Y given X.
Property 3: ANOVA Theorem
→ Total variance = Variance explained by X + Residual variance
3.1.2 Linear Regression and the CEF
The population regression coefficient is defined as:
Solution:
Regression Anatomy Formula
For the k-th regressor in a multivariate regression:
where x̃ki is the residual from regressing xki on all other covariates.
Interpretation: Each coefficient in a multivariate regression is the bivariate slope after "partialling out" all other variables.
Three Justifications for Regression
| Theorem | Statement | When it applies |
|---|---|---|
| Linear CEF | If the CEF is linear, regression gives you the CEF | Joint normality, saturated models |
| Best Linear Predictor | X'β is the best linear predictor of Y (MMSE) | Always |
| Regression-CEF | X'β provides the best linear approximation to E[Y|X] | Always (even if CEF is nonlinear) |
Key insight: Even if the CEF is nonlinear, regression provides the best linear approximation to it. This is the most general justification for using regression.
3.1.3 Asymptotic OLS Inference
The OLS estimator:
Key Asymptotic Results
| Result | What it says |
|---|---|
| Law of Large Numbers | Sample moments → Population moments |
| Central Limit Theorem | √N(β̂ − β) → Normal distribution |
| Slutsky's Theorem | Can replace probability limits with constants |
Heteroskedasticity-Robust Standard Errors
The robust variance estimator:
Why use robust SEs?
- If CEF is nonlinear, residuals vary with X → heteroskedasticity is natural
- Default (homoskedastic) SEs assume E[ei² | Xi] = σ² (constant)
- Robust SEs are valid without this assumption
3.1.4 Saturated Models
Definition: A saturated model has a separate parameter for every possible value of X.
Example with two dummies (x1 = college, x2 = female):
| Term | Name | Interpretation |
|---|---|---|
| β, γ | Main effects | Effect of each variable separately |
| δ | Interaction term | How college effect differs by gender |
Key point: Saturated models fit the CEF perfectly because the CEF is linear in the dummy regressors.
3.2 Regression and Causality
Central Question: When does regression have a causal interpretation?
Answer: When the CEF it approximates is causal, which requires the Conditional Independence Assumption (CIA).
3.2.1 The Conditional Independence Assumption (CIA)
Setup: Potential Outcomes
For schooling s, let Ysi = fi(s) denote person i's potential earnings with s years of education.
The CIA states:
"Potential outcomes are independent of actual schooling, conditional on X"
What does CIA mean?
- Selection on observables: Xi captures all reasons why schooling and potential outcomes are correlated
- As good as random: Conditional on X, schooling is "as good as randomly assigned"
Implications of CIA
Given CIA, conditional comparisons are causal:
→ The difference in mean earnings between schooling levels has a causal interpretation!
From CIA to Regression
Assume a linear constant-effects model:
where ηi is the random part of potential earnings.
Decompose ηi:
The causal regression model becomes:
Given CIA, vi is uncorrelated with si and Xi, so ρ is the causal effect.
3.2.2 The Omitted Variables Bias (OVB) Formula
Consider a "long" regression with ability controls Ai:
And a "short" regression without Ai:
The OVB Formula
Short = Long + (Effect of omitted) × (Regression of omitted on included)
where δAs is the coefficient from regressing Ai on si.
Application: Returns to Schooling
| Controls | Schooling Coefficient |
|---|---|
| None | 0.132 |
| Age dummies | 0.131 |
| + Family background | 0.114 |
| + AFQT score | 0.087 |
| + Occupation dummies | 0.066 |
Source: NLSY data
→ Coefficient decreases as we add controls that are positively correlated with both wages and schooling.
3.2.3 Bad Control
Bad controls are variables that are themselves outcomes of the treatment.
Good controls are variables determined before the treatment.
Example: Controlling for Occupation
Should we control for occupation in a schooling regression?
Problem: College affects occupation choice!
- wi = 1 if white collar job
- College → more likely white collar
Comparing within occupation:
= E[Y1i − Y0i | w1i=1] + {E[Y0i | w1i=1] − E[Y0i | w0i=1]}
↑ Selection bias from composition change
Why is this bias?
- College graduates who work white collar = typical graduates
- Non-graduates who work white collar = exceptional non-graduates
- → Comparing different types of people!
Proxy Control Problem
What if we use a "late" ability measure (measured after schooling)?
If schooling increases measured ability (π1 > 0), controlling for late ability biases the schooling coefficient downward.
Rule of Thumb
Timing matters!
- ✅ Variables measured before treatment → Good controls
- ❌ Variables measured after treatment → Potentially bad controls
Chapter 3 Summary
| Concept | Key Point |
|---|---|
| CEF | E[Y|X] - the MMSE predictor of Y given X |
| Regression | Best linear approximation to the CEF |
| Regression Anatomy | βk = bivariate slope after partialling out other Xs |
| CIA | Ys ⊥ s | X - makes regression causal |
| OVB Formula | Short = Long + (Omitted effect) × (Omitted on included) |
| Bad Control | Don't control for outcomes of treatment |
References
- Barnow, B., Cain, G., & Goldberger, A. (1981). Selection on observables. Evaluation Studies Review Annual.
- White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator. Econometrica.
- Frisch, R., & Waugh, F. (1933). Partial time regressions as compared with individual trends. Econometrica.
- Angrist, J. (1998). Estimating the labor market impact of voluntary military service. Econometrica.
Suhyeon Lee