Angrist & Pischke, Mostly Harmless Econometrics
Chapter 2: The Experimental Ideal
한국어Core Message
The most credible and influential research designs use random assignment.
2.1 The Selection Problem
Motivating Example: Do hospitals make people healthier?
Comparing health status by hospitalization using NHIS data:
| Group | Sample Size | Mean Health Status | Std. Error |
|---|---|---|---|
| Hospitalized | 7,774 | 2.79 | 0.014 |
| Not Hospitalized | 90,049 | 2.07 | 0.003 |
Difference: 0.71 (t-stat = 58.9) → Hospitals appear to make people sicker!
Why this result? People who go to hospitals are sicker to begin with.
The Potential Outcomes Framework
Core concept of the Rubin Causal Model (Rubin, 1974, 1977; Holland, 1986)
Notation:
- Di ∈ {0, 1}: Treatment status (e.g., hospitalization)
- Yi: Observed outcome
- Y1i: Potential outcome if treated
- Y0i: Potential outcome if not treated
Causal effect for individual i: Y1i − Y0i
Observed outcome:
Formal Decomposition of Selection Bias (Step by Step)
Step 1: Starting Point
What we can observe:
"Average health of those who went to hospital" − "Average health of those who didn't"
Step 2: Replace observed Y with potential outcomes
Key: For Di = 1, we only observe Y1i. For Di = 0, we only observe Y0i.
E[Yi | Di = 0] = E[Y0i | Di = 0]
Therefore:
Step 3: The Trick! Add and subtract the same term
Add and subtract E[Y0i | Di = 1] (= adding zero):
↑ These two terms cancel out = 0
Step 4: Rearrange terms
= E[Y1i − Y0i | Di=1] + E[Y0i|Di=1] − E[Y0i|Di=0]
Step 5: Meaning of each term
| Term | Formula | Meaning |
|---|---|---|
| ATT | E[Y1i − Y0i | Di=1] | Average treatment effect on the treated |
| Selection Bias | E[Y0i|Di=1] − E[Y0i|Di=0] | Baseline difference without treatment |
Intuitive Understanding
ATT (Average Treatment effect on the Treated):
- E[Y1i | Di = 1]: Health of hospitalized people (after going)
- E[Y0i | Di = 1]: Health they would have had if they hadn't gone
- Difference = The true effect of the hospital
Selection Bias:
- E[Y0i | Di = 1]: Health hospitalized people would have even without going (originally sick)
- E[Y0i | Di = 0]: Health of non-hospitalized people (originally healthy)
- Difference = Gap from comparing different people
Numerical Example
| Hospitalized | Not Hospitalized | |
|---|---|---|
| Observed health E[Yi|Di] | 2.79 | 2.07 |
| Health if not hospitalized E[Y0i|Di] | 3.50 (unobserved) | 2.07 |
Observed difference: 2.79 − 2.07 = 0.72
Decomposition:
- ATT = 2.79 − 3.50 = −0.71 (hospital makes people healthier!)
- Selection Bias = 3.50 − 2.07 = +1.43 (sicker people go to hospital)
Observed = ATT + Selection Bias
→ Selection bias (+1.43) completely masks the true effect (−0.71)!
2.2 Random Assignment Solves the Selection Problem
Key Principle: Random assignment makes Di independent of potential outcomes.
Mathematical Derivation
Under random assignment:
E[Yi|Di=1] − E[Yi|Di=0]
= E[Y1i|Di=1] − E[Y0i|Di=0]
= E[Y1i|Di=1] − E[Y0i|Di=1] (by independence)
= E[Y1i − Y0i|Di=1]
= E[Y1i − Y0i] (= ATE, Average Treatment Effect)
→ Selection bias disappears, and we can directly estimate the ATE!
Empirical Examples: Non-experimental vs. Randomized Studies
| Research Area | Non-experimental Finding | Randomized Trial Result |
|---|---|---|
| Hormone Replacement Therapy | Nurses Health Study: HRT users healthier | WHI: Few benefits, serious side effects |
| Job Training Programs | Trainees earn less than non-trainees | Mostly positive effects (Lalonde, 1986) |
2.3 The Tennessee STAR Experiment
Experiment Overview
- Purpose: Estimate effects of class size on student achievement
- Duration: Started 1985/86, ran for 4 years (K through 3rd grade)
- Scale: ~11,600 students, cost ~$12 million
- Treatment Arms:
- Small classes (13-17 students)
- Regular classes (22-25) with part-time aide
- Regular classes with full-time aide
Balance Check: Verifying Random Assignment
Compare pre-treatment characteristics across groups:
| Variable | Small | Regular | Reg/Aide | P-value |
|---|---|---|---|---|
| Free lunch | .47 | .48 | .50 | .09 |
| White/Asian | .68 | .67 | .66 | .26 |
| Age in 1985 | 5.44 | 5.43 | 5.42 | .32 |
| K class size | 15.10 | 22.40 | 22.80 | .00 |
| K percentile score | 54.70 | 48.90 | 50.00 | .00 |
✅ Student characteristics (free lunch, race, age) are balanced → Random assignment worked
Main Results
| Variable | (1) | (2) | (3) | (4) |
|---|---|---|---|---|
| Small class | 4.82 (2.19) | 5.37 (1.26) | 5.36 (1.21) | 5.37 (1.19) |
| Regular/aide | .12 (2.23) | .29 (1.13) | .53 (1.09) | .31 (1.07) |
| School FE | No | Yes | Yes | Yes |
| Student controls | No | No | Yes | Yes |
Key Findings:
- Small class effect: ~5-6 percentile points improvement
- Effect size: ~0.2 standard deviations (σ)
- Regular/aide effect: Small and statistically insignificant
2.4 The Attrition Problem
Definition
Attrition: Participants dropping out during the course of an experiment
Attrition in the STAR Experiment
| Time Point | Number of Students |
|---|---|
| Start (Kindergarten) | ~11,600 |
| End (3rd Grade) | Some attrition |
Reasons for attrition:
- School transfers
- Dropping out
- Refusal to continue participation
- Missing data
Why Is This a Problem?
Key issue: Attrition may not be random!
| Scenario | Problem |
|---|---|
| Low-performing students in small classes transfer more | Remaining students' average ↑ → Effect overestimated |
| High-performing students in regular classes transfer more | Remaining students' average ↓ → Effect overestimated |
→ Random assignment is compromised! → Selection bias re-emerges
Mathematical Understanding
Initially, random assignment succeeds:
After attrition:
→ Those who remain may no longer be comparable!
Solutions to the Attrition Problem
| Method | Description |
|---|---|
| Compare attrition rates | Check if attrition rates are similar across treatment/control groups |
| Compare attriter characteristics | Analyze who dropped out (what characteristics do attriters have?) |
| Bounds analysis | Estimate range of effects under worst/best case scenarios |
| ITT analysis | Analyze based on original assignment regardless of attrition (Intent-to-Treat) |
ITT (Intent-to-Treat) Analysis:
- Analyze based on originally assigned group
- Ignore whether treatment was actually received
- Avoids selection bias from attrition
- Drawback: May underestimate actual treatment effect
2.5 Regression Analysis of Experiments
Constant Treatment Effect Model
Assume treatment effect is the same for everyone (Y1i − Y0i = ρ):
α = E(Y0i) ρ = treatment effect ηi = Y0i − E(Y0i)
Selection Bias as Regression
E[Yi|Di=1] − E[Yi|Di=0] = ρ + [E[ηi|Di=1] − E[ηi|Di=0]]
■ ρ: Treatment effect
■ Selection bias: Correlation between error ηi and regressor Di
With random assignment: Selection bias = 0 → Regression coefficient estimates causal effect
Role of Covariates
Long regression:
| Role | Explanation | STAR Example |
|---|---|---|
| 1. Control for conditional randomization | When randomization is within strata, control for stratification variable | Randomized within schools → Include school fixed effects |
| 2. Improve precision | Even if Xi is uncorrelated with Di, explaining Yi variance reduces SE | Race, age, free lunch → SE drops (1.26 → 1.21) |
Quasi-Experimental Approach: Angrist & Lavy (1999)
When randomized trials are impractical, use natural experiments
Setting: Israeli class size cap = 40 students (Maimonides' Rule)
- 5th grade cohort of 40 → class size = 40
- 5th grade cohort of 41 → class splits → class size ≈ 20
Key Assumption
Students in cohorts of 40 vs 41 are similar on other dimensions → "as good as randomly assigned"
Results Comparison
| Analysis Method | Result |
|---|---|
| Naive comparison | Small class students score lower (selection bias) |
| Quasi-experimental (RDD) | Strong positive relationship between class size and achievement |
Chapter 2 Summary
| Concept | Description |
|---|---|
| Potential Outcomes | Y1i, Y0i: Hypothetical outcomes under each treatment state |
| Causal Effect | Y1i − Y0i: Individual treatment effect |
| Selection Bias | Difference in baseline characteristics between treated/untreated |
| Random Assignment | Makes Di independent of potential outcomes, eliminating selection bias |
| Natural Experiment | Uses exogenous variation to approximate random assignment |
Appendix: Regression Analysis of Experiments (Deep Dive)
A.1 Why Use Regression?
The simplest way to estimate treatment effects in an experiment:
With regression:
Here, ρ̂ is identical to Ȳtreated − Ȳcontrol!
Why bother with regression?
- Easy to control for covariates
- Convenient standard error calculation
- Flexible model extensions
A.2 Deriving the Constant Treatment Effect Model
Assumption: Treatment effect is identical for everyone
Decompose the potential outcome:
Y0i = α + ηi
α = mean ηi = individual deviation
Observed outcome:
= (α + ηi) + ρ · Di
= α + ρDi + ηi
| Term | Meaning |
|---|---|
| α | E[Y0i], average outcome without treatment |
| ρ | Y1i − Y0i, treatment effect |
| ηi | Y0i − E[Y0i], individual random error |
A.3 Selection Bias as Regression
Conditional expectations in the regression model:
E[Yi | Di = 0] = α + E[ηi | Di = 0]
Taking the difference:
■ ρ: Treatment effect
■ Selection bias: Correlation between error ηi and treatment Di
This equals the selection bias we saw earlier:
A.4 Random Assignment → OLS Estimates Causal Effect
Under random assignment:
Therefore:
Result:
→ OLS estimate ρ̂ is the causal effect!
A.5 Two Roles of Adding Covariates
Long regression:
Role 1: Control for Conditional Random Assignment
In the STAR experiment:
- Random assignment within schools
- Not random across schools (urban vs rural)
Why necessary?
| School | Treatment Prob. | Avg. Score |
|---|---|---|
| Urban A | 40% | High |
| Rural B | 30% | Low |
→ Without school controls, treatment effect may be contaminated
Role 2: Improve Estimation Precision
Key principle: If Xi explains variance in Yi, residual variance decreases, reducing SE of ρ̂
Short regression: Yi = α + ρDi + ηi
Var(ρ̂) ∝ Var(ηi) / n
Long regression: Yi = α + ρDi + Xi'γ + η̃i
Var(ρ̂) ∝ Var(η̃i) / n
If Xi explains Yi well: Var(η̃i) < Var(ηi)
STAR experiment results:
| Model | Small Class Effect | Std. Error |
|---|---|---|
| No controls | 5.37 | 1.26 |
| Student controls | 5.36 | 1.21 |
→ Estimate nearly identical, only standard error decreases!
A.6 Key Point: Short vs Long Regression
If random assignment succeeded:
Why? Because Di is uncorrelated with Xi!
Mathematically (Omitted Variable Bias formula):
↑ ≈ 0 under random assignment
A.7 Summary
| Scenario | Regression Result |
|---|---|
| Random assignment ✓ | ρ̂ = Causal effect (ATE) |
| Random assignment ✗ | ρ̂ = Causal effect + Selection bias |
| Add covariates (under randomization) | Same estimate, smaller SE |
| Add covariates (conditional randomization) | Required (removes bias) |
References
- Krueger, A. B. (1999). Experimental estimates of education production functions. QJE.
- Angrist, J. D., & Lavy, V. (1999). Using Maimonides' rule to estimate the effect of class size. QJE.
- Rubin, D. B. (1974). Estimating causal effects of treatments. Journal of Educational Psychology.
- Holland, P. W. (1986). Statistics and causal inference. JASA.
- Lalonde, R. J. (1986). Evaluating the econometric evaluations of training programs. AER.
Suhyeon Lee