Angrist Ch.2 - The Experimental Ideal

Chapter 2: The Experimental Ideal

한국어

Angrist & Pischke, Mostly Harmless Econometrics

Core Message

The most credible and influential research designs use random assignment.

2.1 The Selection Problem

Motivating Example: Do hospitals make people healthier?

Comparing health status by hospitalization using NHIS data:

Group	Sample Size	Mean Health Status	Std. Error
Hospitalized	7,774	2.79	0.014
Not Hospitalized	90,049	2.07	0.003

Difference: 0.71 (t-stat = 58.9) → Hospitals appear to make people sicker!

Why this result? People who go to hospitals are sicker to begin with.

The Potential Outcomes Framework

Core concept of the Rubin Causal Model (Rubin, 1974, 1977; Holland, 1986)

Notation:

D_i ∈ {0, 1}: Treatment status (e.g., hospitalization)
Y_i: Observed outcome
Y_1i: Potential outcome if treated
Y_0i: Potential outcome if not treated

Causal effect for individual i: Y_1i − Y_0i

Observed outcome:

Y_i = Y_0i + (Y_1i − Y_0i) · D_i

Formal Decomposition of Selection Bias (Step by Step)

Step 1: Starting Point

What we can observe:

E[Y_i | D_i = 1] − E[Y_i | D_i = 0]

"Average health of those who went to hospital" − "Average health of those who didn't"

Step 2: Replace observed Y with potential outcomes

Key: For D_i = 1, we only observe Y_1i. For D_i = 0, we only observe Y_0i.

E[Y_i | D_i = 1] = E[Y_1i | D_i = 1]
E[Y_i | D_i = 0] = E[Y_0i | D_i = 0]

Therefore:

E[Y_i|D_i=1] − E[Y_i|D_i=0] = E[Y_1i|D_i=1] − E[Y_0i|D_i=0]

Step 3: The Trick! Add and subtract the same term

Add and subtract E[Y_0i | D_i = 1] (= adding zero):

= E[Y_1i|D_i=1] − E[Y_0i|D_i=1] + E[Y_0i|D_i=1] − E[Y_0i|D_i=0]
↑ These two terms cancel out = 0

Step 4: Rearrange terms

Step 5: Meaning of each term

Term	Formula	Meaning
ATT	E[Y_1i − Y_0i \| D_i=1]	Average treatment effect on the treated
Selection Bias	E[Y_0i\|D_i=1] − E[Y_0i\|D_i=0]	Baseline difference without treatment

Intuitive Understanding

ATT (Average Treatment effect on the Treated):

E[Y_1i | D_i = 1]: Health of hospitalized people (after going)
E[Y_0i | D_i = 1]: Health they would have had if they hadn't gone
Difference = The true effect of the hospital

Selection Bias:

E[Y_0i | D_i = 1]: Health hospitalized people would have even without going (originally sick)
E[Y_0i | D_i = 0]: Health of non-hospitalized people (originally healthy)
Difference = Gap from comparing different people

Numerical Example

	Hospitalized	Not Hospitalized
Observed health E[Y_i\|D_i]	2.79	2.07
Health if not hospitalized E[Y_0i\|D_i]	3.50 (unobserved)	2.07

Observed difference: 2.79 − 2.07 = 0.72

Decomposition:

ATT = 2.79 − 3.50 = −0.71 (hospital makes people healthier!)
Selection Bias = 3.50 − 2.07 = +1.43 (sicker people go to hospital)

0.72 = −0.71 + +1.43
Observed = ATT + Selection Bias

→ Selection bias (+1.43) completely masks the true effect (−0.71)!

2.2 Random Assignment Solves the Selection Problem

Key Principle: Random assignment makes D_i independent of potential outcomes.

Mathematical Derivation

Under random assignment:

E[Y_i|D_i=1] − E[Y_i|D_i=0]

= E[Y_1i|D_i=1] − E[Y_0i|D_i=0]

= E[Y_1i|D_i=1] − E[Y_0i|D_i=1] (by independence)

= E[Y_1i − Y_0i|D_i=1]

= E[Y_1i − Y_0i] (= ATE, Average Treatment Effect)

→ Selection bias disappears, and we can directly estimate the ATE!

Empirical Examples: Non-experimental vs. Randomized Studies

Research Area	Non-experimental Finding	Randomized Trial Result
Hormone Replacement Therapy	Nurses Health Study: HRT users healthier	WHI: Few benefits, serious side effects
Job Training Programs	Trainees earn less than non-trainees	Mostly positive effects (Lalonde, 1986)

2.3 The Tennessee STAR Experiment

Experiment Overview

Purpose: Estimate effects of class size on student achievement
Duration: Started 1985/86, ran for 4 years (K through 3rd grade)
Scale: ~11,600 students, cost ~$12 million
Treatment Arms:
1. Small classes (13-17 students)
2. Regular classes (22-25) with part-time aide
3. Regular classes with full-time aide

Balance Check: Verifying Random Assignment

Compare pre-treatment characteristics across groups:

Variable	Small	Regular	Reg/Aide	P-value
Free lunch	.47	.48	.50	.09
White/Asian	.68	.67	.66	.26
Age in 1985	5.44	5.43	5.42	.32
K class size	15.10	22.40	22.80	.00
K percentile score	54.70	48.90	50.00	.00

✅ Student characteristics (free lunch, race, age) are balanced → Random assignment worked

Main Results

Variable	(1)	(2)	(3)	(4)
Small class	4.82 (2.19)	5.37 (1.26)	5.36 (1.21)	5.37 (1.19)
Regular/aide	.12 (2.23)	.29 (1.13)	.53 (1.09)	.31 (1.07)
School FE	No	Yes	Yes	Yes
Student controls	No	No	Yes	Yes

Key Findings:

Small class effect: ~5-6 percentile points improvement
Effect size: ~0.2 standard deviations (σ)
Regular/aide effect: Small and statistically insignificant

2.4 The Attrition Problem

Definition

Attrition: Participants dropping out during the course of an experiment

Attrition in the STAR Experiment

Time Point	Number of Students
Start (Kindergarten)	~11,600
End (3rd Grade)	Some attrition

Reasons for attrition:

School transfers
Dropping out
Refusal to continue participation
Missing data

Why Is This a Problem?

Key issue: Attrition may not be random!

Scenario	Problem
Low-performing students in small classes transfer more	Remaining students' average ↑ → Effect overestimated
High-performing students in regular classes transfer more	Remaining students' average ↓ → Effect overestimated

→ Random assignment is compromised! → Selection bias re-emerges

Mathematical Understanding

Initially, random assignment succeeds:

E[Y_0i | D_i = 1] = E[Y_0i | D_i = 0]

After attrition:

E[Y_0i | D_i = 1, Stayer] ≠ E[Y_0i | D_i = 0, Stayer]

→ Those who remain may no longer be comparable!

Solutions to the Attrition Problem

Method	Description
Compare attrition rates	Check if attrition rates are similar across treatment/control groups
Compare attriter characteristics	Analyze who dropped out (what characteristics do attriters have?)
Bounds analysis	Estimate range of effects under worst/best case scenarios
ITT analysis	Analyze based on original assignment regardless of attrition (Intent-to-Treat)

ITT (Intent-to-Treat) Analysis:

Analyze based on originally assigned group
Ignore whether treatment was actually received
Avoids selection bias from attrition
Drawback: May underestimate actual treatment effect

2.5 Regression Analysis of Experiments

Constant Treatment Effect Model

Assume treatment effect is the same for everyone (Y_1i − Y_0i = ρ):

Y_i = α + ρ D_i + η_i

α = E(Y_0i) ρ = treatment effect η_i = Y_0i − E(Y_0i)

Selection Bias as Regression

E[Y_i|D_i=1] − E[Y_i|D_i=0] = ρ + [E[η_i|D_i=1] − E[η_i|D_i=0]]

■ ρ: Treatment effect
■ Selection bias: Correlation between error η_i and regressor D_i

With random assignment: Selection bias = 0 → Regression coefficient estimates causal effect

Role of Covariates

Long regression:

Y_i = α + ρD_i + X_i'γ + η_i

Role	Explanation	STAR Example
1. Control for conditional randomization	When randomization is within strata, control for stratification variable	Randomized within schools → Include school fixed effects
2. Improve precision	Even if X_i is uncorrelated with D_i, explaining Y_i variance reduces SE	Race, age, free lunch → SE drops (1.26 → 1.21)

Quasi-Experimental Approach: Angrist & Lavy (1999)

When randomized trials are impractical, use natural experiments

Setting: Israeli class size cap = 40 students (Maimonides' Rule)

5th grade cohort of 40 → class size = 40
5th grade cohort of 41 → class splits → class size ≈ 20

Key Assumption

Students in cohorts of 40 vs 41 are similar on other dimensions → "as good as randomly assigned"

Results Comparison

Analysis Method	Result
Naive comparison	Small class students score lower (selection bias)
Quasi-experimental (RDD)	Strong positive relationship between class size and achievement

Chapter 2 Summary

Concept	Description
Potential Outcomes	Y_1i, Y_0i: Hypothetical outcomes under each treatment state
Causal Effect	Y_1i − Y_0i: Individual treatment effect
Selection Bias	Difference in baseline characteristics between treated/untreated
Random Assignment	Makes D_i independent of potential outcomes, eliminating selection bias
Natural Experiment	Uses exogenous variation to approximate random assignment

Appendix: Regression Analysis of Experiments (Deep Dive)

A.1 Why Use Regression?

The simplest way to estimate treatment effects in an experiment:

Ȳ_treated − Ȳ_control

With regression:

Y_i = α + ρD_i + η_i

Here, ρ̂ is identical to Ȳ_treated − Ȳ_control!

Why bother with regression?

Easy to control for covariates
Convenient standard error calculation
Flexible model extensions

A.2 Deriving the Constant Treatment Effect Model

Assumption: Treatment effect is identical for everyone

Y_1i − Y_0i = ρ (constant)

Decompose the potential outcome:

Y_0i = E[Y_0i] + (Y_0i − E[Y_0i])

Y_0i = α + η_i

α = mean η_i = individual deviation

Observed outcome:

Y_i = Y_0i + (Y_1i − Y_0i) · D_i
= (α + η_i) + ρ · D_i
= α + ρD_i + η_i

Term	Meaning
α	E[Y_0i], average outcome without treatment
ρ	Y_1i − Y_0i, treatment effect
η_i	Y_0i − E[Y_0i], individual random error

A.3 Selection Bias as Regression

Conditional expectations in the regression model:

E[Y_i | D_i = 1] = α + ρ + E[η_i | D_i = 1]
E[Y_i | D_i = 0] = α + E[η_i | D_i = 0]

Taking the difference:

E[Y_i|D_i=1] − E[Y_i|D_i=0] = ρ + (E[η_i|D_i=1] − E[η_i|D_i=0])

■ ρ: Treatment effect
■ Selection bias: Correlation between error η_i and treatment D_i

This equals the selection bias we saw earlier:

E[η_i|D_i=1] − E[η_i|D_i=0] = E[Y_0i|D_i=1] − E[Y_0i|D_i=0]

A.4 Random Assignment → OLS Estimates Causal Effect

Under random assignment:

D_i ⊥ η_i

Therefore:

E[η_i | D_i = 1] = E[η_i | D_i = 0] = E[η_i] = 0

Result:

E[Y_i | D_i = 1] − E[Y_i | D_i = 0] = ρ

→ OLS estimate ρ̂ is the causal effect!

A.5 Two Roles of Adding Covariates

Long regression:

Y_i = α + ρD_i + X_i'γ + η_i

Role 1: Control for Conditional Random Assignment

In the STAR experiment:

Random assignment within schools
Not random across schools (urban vs rural)

Y_i = α + ρD_i + Σ_j δ_j · 𝟙[School_i = j] + η_i

Why necessary?

School	Treatment Prob.	Avg. Score
Urban A	40%	High
Rural B	30%	Low

→ Without school controls, treatment effect may be contaminated

Role 2: Improve Estimation Precision

Key principle: If X_i explains variance in Y_i, residual variance decreases, reducing SE of ρ̂

Short regression: Y_i = α + ρD_i + η_i

Var(ρ̂) ∝ Var(η_i) / n

Long regression: Y_i = α + ρD_i + X_i'γ + η̃_i

Var(ρ̂) ∝ Var(η̃_i) / n

If X_i explains Y_i well: Var(η̃_i) < Var(η_i)

STAR experiment results:

Model	Small Class Effect	Std. Error
No controls	5.37	1.26
Student controls	5.36	1.21

→ Estimate nearly identical, only standard error decreases!

A.6 Key Point: Short vs Long Regression

If random assignment succeeded:

ρ̂_short ≈ ρ̂_long

Why? Because D_i is uncorrelated with X_i!

Mathematically (Omitted Variable Bias formula):

ρ̂_short = ρ̂_long + γ̂ · Cov(D_i, X_i) / Var(D_i)
↑ ≈ 0 under random assignment

A.7 Summary

Scenario	Regression Result
Random assignment ✓	ρ̂ = Causal effect (ATE)
Random assignment ✗	ρ̂ = Causal effect + Selection bias
Add covariates (under randomization)	Same estimate, smaller SE
Add covariates (conditional randomization)	Required (removes bias)

References

Krueger, A. B. (1999). Experimental estimates of education production functions. QJE.
Angrist, J. D., & Lavy, V. (1999). Using Maimonides' rule to estimate the effect of class size. QJE.
Rubin, D. B. (1974). Estimating causal effects of treatments. Journal of Educational Psychology.
Holland, P. W. (1986). Statistics and causal inference. JASA.
Lalonde, R. J. (1986). Evaluating the econometric evaluations of training programs. AER.