Chapter 2: The Experimental Ideal

한국어

Angrist & Pischke, Mostly Harmless Econometrics

Core Message

The most credible and influential research designs use random assignment.

2.1 The Selection Problem

Motivating Example: Do hospitals make people healthier?

Comparing health status by hospitalization using NHIS data:

Group Sample Size Mean Health Status Std. Error
Hospitalized 7,774 2.79 0.014
Not Hospitalized 90,049 2.07 0.003

Difference: 0.71 (t-stat = 58.9) → Hospitals appear to make people sicker!

Why this result? People who go to hospitals are sicker to begin with.

The Potential Outcomes Framework

Core concept of the Rubin Causal Model (Rubin, 1974, 1977; Holland, 1986)

Notation:

  • Di ∈ {0, 1}: Treatment status (e.g., hospitalization)
  • Yi: Observed outcome
  • Y1i: Potential outcome if treated
  • Y0i: Potential outcome if not treated

Causal effect for individual i: Y1i − Y0i

Observed outcome:

Yi = Y0i + (Y1i − Y0i) · Di

Formal Decomposition of Selection Bias (Step by Step)

Step 1: Starting Point

What we can observe:

E[Yi | Di = 1] − E[Yi | Di = 0]

"Average health of those who went to hospital" − "Average health of those who didn't"

Step 2: Replace observed Y with potential outcomes

Key: For Di = 1, we only observe Y1i. For Di = 0, we only observe Y0i.

E[Yi | Di = 1] = E[Y1i | Di = 1]
E[Yi | Di = 0] = E[Y0i | Di = 0]

Therefore:

E[Yi|Di=1] − E[Yi|Di=0] = E[Y1i|Di=1] − E[Y0i|Di=0]

Step 3: The Trick! Add and subtract the same term

Add and subtract E[Y0i | Di = 1] (= adding zero):

= E[Y1i|Di=1] − E[Y0i|Di=1] + E[Y0i|Di=1] − E[Y0i|Di=0]
↑ These two terms cancel out = 0

Step 4: Rearrange terms

= (E[Y1i|Di=1] − E[Y0i|Di=1]) + (E[Y0i|Di=1] − E[Y0i|Di=0])

= E[Y1i − Y0i | Di=1] + E[Y0i|Di=1] − E[Y0i|Di=0]

Step 5: Meaning of each term

Term Formula Meaning
ATT E[Y1i − Y0i | Di=1] Average treatment effect on the treated
Selection Bias E[Y0i|Di=1] − E[Y0i|Di=0] Baseline difference without treatment

Intuitive Understanding

ATT (Average Treatment effect on the Treated):

  • E[Y1i | Di = 1]: Health of hospitalized people (after going)
  • E[Y0i | Di = 1]: Health they would have had if they hadn't gone
  • Difference = The true effect of the hospital

Selection Bias:

  • E[Y0i | Di = 1]: Health hospitalized people would have even without going (originally sick)
  • E[Y0i | Di = 0]: Health of non-hospitalized people (originally healthy)
  • Difference = Gap from comparing different people

Numerical Example

Hospitalized Not Hospitalized
Observed health E[Yi|Di] 2.79 2.07
Health if not hospitalized E[Y0i|Di] 3.50 (unobserved) 2.07

Observed difference: 2.79 − 2.07 = 0.72

Decomposition:

  • ATT = 2.79 − 3.50 = −0.71 (hospital makes people healthier!)
  • Selection Bias = 3.50 − 2.07 = +1.43 (sicker people go to hospital)
0.72 = −0.71 + +1.43
Observed = ATT + Selection Bias

Selection bias (+1.43) completely masks the true effect (−0.71)!

2.2 Random Assignment Solves the Selection Problem

Key Principle: Random assignment makes Di independent of potential outcomes.

Mathematical Derivation

Under random assignment:

E[Yi|Di=1] − E[Yi|Di=0]

= E[Y1i|Di=1] − E[Y0i|Di=0]

= E[Y1i|Di=1] − E[Y0i|Di=1] (by independence)

= E[Y1i − Y0i|Di=1]

= E[Y1i − Y0i] (= ATE, Average Treatment Effect)

Selection bias disappears, and we can directly estimate the ATE!

Empirical Examples: Non-experimental vs. Randomized Studies

Research Area Non-experimental Finding Randomized Trial Result
Hormone Replacement Therapy Nurses Health Study: HRT users healthier WHI: Few benefits, serious side effects
Job Training Programs Trainees earn less than non-trainees Mostly positive effects (Lalonde, 1986)

2.3 The Tennessee STAR Experiment

Experiment Overview

  • Purpose: Estimate effects of class size on student achievement
  • Duration: Started 1985/86, ran for 4 years (K through 3rd grade)
  • Scale: ~11,600 students, cost ~$12 million
  • Treatment Arms:
    1. Small classes (13-17 students)
    2. Regular classes (22-25) with part-time aide
    3. Regular classes with full-time aide

Balance Check: Verifying Random Assignment

Compare pre-treatment characteristics across groups:

Variable Small Regular Reg/Aide P-value
Free lunch .47 .48 .50 .09
White/Asian .68 .67 .66 .26
Age in 1985 5.44 5.43 5.42 .32
K class size 15.10 22.40 22.80 .00
K percentile score 54.70 48.90 50.00 .00

✅ Student characteristics (free lunch, race, age) are balanced → Random assignment worked

Main Results

Variable (1) (2) (3) (4)
Small class 4.82 (2.19) 5.37 (1.26) 5.36 (1.21) 5.37 (1.19)
Regular/aide .12 (2.23) .29 (1.13) .53 (1.09) .31 (1.07)
School FE No Yes Yes Yes
Student controls No No Yes Yes

Key Findings:

  • Small class effect: ~5-6 percentile points improvement
  • Effect size: ~0.2 standard deviations (σ)
  • Regular/aide effect: Small and statistically insignificant

2.4 The Attrition Problem

Definition

Attrition: Participants dropping out during the course of an experiment

Attrition in the STAR Experiment

Time Point Number of Students
Start (Kindergarten) ~11,600
End (3rd Grade) Some attrition

Reasons for attrition:

  • School transfers
  • Dropping out
  • Refusal to continue participation
  • Missing data

Why Is This a Problem?

Key issue: Attrition may not be random!

Scenario Problem
Low-performing students in small classes transfer more Remaining students' average ↑ → Effect overestimated
High-performing students in regular classes transfer more Remaining students' average ↓ → Effect overestimated

Random assignment is compromised! → Selection bias re-emerges

Mathematical Understanding

Initially, random assignment succeeds:

E[Y0i | Di = 1] = E[Y0i | Di = 0]

After attrition:

E[Y0i | Di = 1, Stayer] ≠ E[Y0i | Di = 0, Stayer]

→ Those who remain may no longer be comparable!

Solutions to the Attrition Problem

Method Description
Compare attrition rates Check if attrition rates are similar across treatment/control groups
Compare attriter characteristics Analyze who dropped out (what characteristics do attriters have?)
Bounds analysis Estimate range of effects under worst/best case scenarios
ITT analysis Analyze based on original assignment regardless of attrition (Intent-to-Treat)

ITT (Intent-to-Treat) Analysis:

  • Analyze based on originally assigned group
  • Ignore whether treatment was actually received
  • Avoids selection bias from attrition
  • Drawback: May underestimate actual treatment effect

2.5 Regression Analysis of Experiments

Constant Treatment Effect Model

Assume treatment effect is the same for everyone (Y1i − Y0i = ρ):

Yi = α + ρ Di + ηi

α = E(Y0i)    ρ = treatment effect    ηi = Y0i − E(Y0i)

Selection Bias as Regression

E[Yi|Di=1] − E[Yi|Di=0] = ρ + [E[ηi|Di=1] − E[ηi|Di=0]]

■ ρ: Treatment effect
■ Selection bias: Correlation between error ηi and regressor Di

With random assignment: Selection bias = 0 → Regression coefficient estimates causal effect

Role of Covariates

Long regression:

Yi = α + ρDi + Xi'γ + ηi
Role Explanation STAR Example
1. Control for conditional randomization When randomization is within strata, control for stratification variable Randomized within schools → Include school fixed effects
2. Improve precision Even if Xi is uncorrelated with Di, explaining Yi variance reduces SE Race, age, free lunch → SE drops (1.26 → 1.21)

Quasi-Experimental Approach: Angrist & Lavy (1999)

When randomized trials are impractical, use natural experiments

Setting: Israeli class size cap = 40 students (Maimonides' Rule)

  • 5th grade cohort of 40 → class size = 40
  • 5th grade cohort of 41 → class splits → class size ≈ 20

Key Assumption

Students in cohorts of 40 vs 41 are similar on other dimensions → "as good as randomly assigned"

Results Comparison

Analysis Method Result
Naive comparison Small class students score lower (selection bias)
Quasi-experimental (RDD) Strong positive relationship between class size and achievement

Chapter 2 Summary

Concept Description
Potential Outcomes Y1i, Y0i: Hypothetical outcomes under each treatment state
Causal Effect Y1i − Y0i: Individual treatment effect
Selection Bias Difference in baseline characteristics between treated/untreated
Random Assignment Makes Di independent of potential outcomes, eliminating selection bias
Natural Experiment Uses exogenous variation to approximate random assignment

Appendix: Regression Analysis of Experiments (Deep Dive)

A.1 Why Use Regression?

The simplest way to estimate treatment effects in an experiment:

Ȳtreated − Ȳcontrol

With regression:

Yi = α + ρDi + ηi

Here, ρ̂ is identical to Ȳtreated − Ȳcontrol!

Why bother with regression?

  • Easy to control for covariates
  • Convenient standard error calculation
  • Flexible model extensions

A.2 Deriving the Constant Treatment Effect Model

Assumption: Treatment effect is identical for everyone

Y1i − Y0i = ρ (constant)

Decompose the potential outcome:

Y0i = E[Y0i] + (Y0i − E[Y0i])

Y0i = α + ηi

α = mean    ηi = individual deviation

Observed outcome:

Yi = Y0i + (Y1i − Y0i) · Di
   = (α + ηi) + ρ · Di
   = α + ρDi + ηi
Term Meaning
α E[Y0i], average outcome without treatment
ρ Y1i − Y0i, treatment effect
ηi Y0i − E[Y0i], individual random error

A.3 Selection Bias as Regression

Conditional expectations in the regression model:

E[Yi | Di = 1] = α + ρ + E[ηi | Di = 1]
E[Yi | Di = 0] = α + E[ηi | Di = 0]

Taking the difference:

E[Yi|Di=1] − E[Yi|Di=0] = ρ + (E[ηi|Di=1] − E[ηi|Di=0])

■ ρ: Treatment effect
■ Selection bias: Correlation between error ηi and treatment Di

This equals the selection bias we saw earlier:

E[ηi|Di=1] − E[ηi|Di=0] = E[Y0i|Di=1] − E[Y0i|Di=0]

A.4 Random Assignment → OLS Estimates Causal Effect

Under random assignment:

Di ⊥ ηi

Therefore:

E[ηi | Di = 1] = E[ηi | Di = 0] = E[ηi] = 0

Result:

E[Yi | Di = 1] − E[Yi | Di = 0] = ρ

→ OLS estimate ρ̂ is the causal effect!

A.5 Two Roles of Adding Covariates

Long regression:

Yi = α + ρDi + Xi'γ + ηi

Role 1: Control for Conditional Random Assignment

In the STAR experiment:

  • Random assignment within schools
  • Not random across schools (urban vs rural)
Yi = α + ρDi + Σj δj · 𝟙[Schooli = j] + ηi

Why necessary?

School Treatment Prob. Avg. Score
Urban A 40% High
Rural B 30% Low

→ Without school controls, treatment effect may be contaminated

Role 2: Improve Estimation Precision

Key principle: If Xi explains variance in Yi, residual variance decreases, reducing SE of ρ̂

Short regression: Yi = α + ρDi + ηi

Var(ρ̂) ∝ Var(ηi) / n


Long regression: Yi = α + ρDi + Xi'γ + η̃i

Var(ρ̂) ∝ Var(η̃i) / n

If Xi explains Yi well: Var(η̃i) < Var(ηi)

STAR experiment results:

Model Small Class Effect Std. Error
No controls 5.37 1.26
Student controls 5.36 1.21

→ Estimate nearly identical, only standard error decreases!

A.6 Key Point: Short vs Long Regression

If random assignment succeeded:

ρ̂short ≈ ρ̂long

Why? Because Di is uncorrelated with Xi!

Mathematically (Omitted Variable Bias formula):

ρ̂short = ρ̂long + γ̂ · Cov(Di, Xi) / Var(Di)
↑ ≈ 0 under random assignment

A.7 Summary

Scenario Regression Result
Random assignment ✓ ρ̂ = Causal effect (ATE)
Random assignment ✗ ρ̂ = Causal effect + Selection bias
Add covariates (under randomization) Same estimate, smaller SE
Add covariates (conditional randomization) Required (removes bias)

References

  • Krueger, A. B. (1999). Experimental estimates of education production functions. QJE.
  • Angrist, J. D., & Lavy, V. (1999). Using Maimonides' rule to estimate the effect of class size. QJE.
  • Rubin, D. B. (1974). Estimating causal effects of treatments. Journal of Educational Psychology.
  • Holland, P. W. (1986). Statistics and causal inference. JASA.
  • Lalonde, R. J. (1986). Evaluating the econometric evaluations of training programs. AER.
← Chapter 1: Questions about Questions Back to Study Notes →
This note was written with the assistance of LLM (Claude).