Angrist Ch.4-1 - IV Basics, Wald & 2SLS

Chapter 4 Part 1: IV Basics, Wald & 2SLS

한국어

Angrist & Pischke, Mostly Harmless Econometrics — Sections 4.1–4.3

Core Message

Instrumental Variables (IV) solves omitted variables bias by using a variable (the instrument) that affects the outcome only through its effect on the treatment. The IV estimand is the ratio of the reduced form (instrument → outcome) to the first stage (instrument → treatment).

Key questions this part answers:

What assumptions make IV valid? → Exclusion restriction + first stage
How does 2SLS work? → Replace endogenous variable with first-stage fitted values
What is the Wald estimator? → Simplest IV with a binary instrument
How are grouped data and 2SLS related? → 2SLS with dummies = GLS on group means

4.1 IV and Causality

The Problem IV Solves

Suppose the "long regression" (with all necessary controls) is:

y_i = α + ρs_i + A_i'γ + v_i

where A_i ("ability") makes schooling s_i uncorrelated with v_i. If A_i is unobserved, OLS on the "short regression" y_i = α + ρ̃s_i + ε_i is biased. IV fixes this without observing A_i.

The IV Setup (Constant Effects)

An instrument z_i must satisfy two conditions:

Condition	Formal Statement	Meaning
Relevance (First stage)	Cov(s_i, z_i) ≠ 0	The instrument actually affects the treatment
Exclusion restriction	Cov(ε_i, z_i) = 0	The instrument affects the outcome only through the treatment

The IV Estimand

ρ = Cov(y_i, z_i) / Cov(s_i, z_i) = Reduced form / First stage

The causal effect is the ratio of two regression coefficients:

Reduced form: regression of y_i on z_i (how the instrument affects the outcome)
First stage: regression of s_i on z_i (how the instrument affects the treatment)

Example: Quarter of Birth (Angrist & Krueger 1991)

Logic: School start-age rules + compulsory schooling laws → children born in early quarters get slightly less schooling.

Treatment: years of education (s_i)
Instrument: quarter of birth (z_i)
Outcome: log weekly wages (y_i)

Why valid? Date of birth is essentially random and plausibly affects earnings only through schooling.

The Two Equations

First stage: s_i = X_i'π₁₀ + π₁₁z_i + η_1i

Reduced form: y_i = X_i'π₂₀ + π₂₁z_i + η_2i

The IV estimand is ρ = π₂₁ / π₁₁, also called the Indirect Least Squares (ILS) estimator.

4.1.1 Two-Stage Least Squares (2SLS)

2SLS operationalizes IV as a two-step procedure:

Stage 1: Regress the endogenous variable on instruments and covariates to get fitted values.

ŝ_i = X_i'π̂₁₀ + π̂₁₁z_i

Stage 2: Regress the outcome on fitted values and covariates.

y_i = δ'X_i + ρŝ_i + [ε_i + (s_i − ŝ_i)]

Why does it work?

ŝ_i retains only the variation in schooling driven by the instrument
This quasi-experimental variation is uncorrelated with the error term
With a single instrument, 2SLS = ILS (reduced form ÷ first stage)

Multiple Instruments

With three quarter-of-birth dummies (z_1i, z_2i, z_3i), the first stage becomes:

s_i = X_i'π₁₀ + π₁₁z_1i + π₁₂z_2i + π₁₃z_3i + η_1i

2SLS optimally combines multiple instruments into a single fitted value. The exclusion restriction requires that all instruments are uncorrelated with the structural error.

Results: Returns to Schooling

Specification	OLS	2SLS	Instruments
No controls	0.075	0.103 (0.024)	QOB=1 dummy
YOB + SOB dummies	0.072	0.108 (0.019)	3 QOB dummies
+ QOB×YOB interactions	0.072	0.089 (0.016)	30 instruments

2SLS estimates are slightly larger than OLS, suggesting OVB does not drive the schooling-earnings relationship in this case.

4.1.2 The Wald Estimator

The simplest IV setup: a single binary instrument, no covariates.

The Wald formula:

ρ = [E(y_i|z_i=1) − E(y_i|z_i=0)] / [E(s_i|z_i=1) − E(s_i|z_i=0)]

= Difference in outcome means ÷ Difference in treatment means

Example 1: Returns to Schooling

	Born Q1–Q2	Born Q3–Q4	Difference
ln(weekly wage)	5.8916	5.9051	−0.01349
Years of education	12.6881	12.8394	−0.1514
Wald estimate			0.0891 (0.021)

Example 2: Vietnam Draft Lottery (Angrist 1990)

Setup: Random draft lottery numbers → draft eligibility → military service → earnings

Instrument: draft-eligibility (random, binary)
Treatment: veteran status
Draft-eligible men were 15.9 pp more likely to serve
Wald estimate: service reduced 1981 earnings by ~$2,741

Validity check: No effect on 1969 earnings (pre-lottery) → instrument is clean.

Example 3: Fertility and Labor Supply (Angrist & Evans 1998)

Two instruments for having a third child among mothers with ≥2 children:

Outcome	OLS	Twins IV (1st stage: 0.625)	Same-sex IV (1st stage: 0.067)
Employment	−0.167	−0.083	−0.135
Weeks worked	−8.05	−3.83	−6.23

Different instruments yield different estimates → foreshadows heterogeneous effects (Part 2).

4.1.3 Grouped Data and 2SLS

Key insight: 2SLS with dummy instruments = GLS on group means = Efficient linear combination of all possible Wald estimators.

When the instrument takes on discrete values (j = 1, …, J), define group means ȳ_j and p̂_j. The grouped regression:

ȳ_j = α + ρp̂_j + ε̄_j

GLS (weighted by group size n_j) on this equation equals 2SLS using a full set of group dummies as instruments.

Visual Instrumental Variables (VIV)

A VIV plot displays the grouped-data relationship: average outcome vs. probability of treatment, across instrument cells. The slope of the line through these points is the IV estimate. This provides a powerful visual check on the IV strategy.

Draft lottery VIV (Angrist 1990): Plotting average earnings residuals vs. probability of service across 5-number RSN cells gives an IV estimate of about −$2,400, consistent with the Wald estimate.

4.2 Asymptotic 2SLS Inference

4.2.1 Standard Errors

The 2SLS standard errors differ from manual two-step OLS standard errors. The error variance should use the structural residual ε_i, not the second-stage residual ε_i + (s_i − ŝ_i). Always use canned 2SLS routines to get correct standard errors.

Warning: Running "manual 2SLS" (regressing y on ŝ by OLS) gives wrong standard errors. The OLS residual variance includes the first-stage estimation error, overstating the true residual variance.

4.2.2 Over-Identification Tests

When you have more instruments than endogenous variables (over-identification), you can test whether all instruments give the same answer.

Over-ID test statistic: Under H₀: E[Z_iε_i] = 0, the minimized 2SLS minimand follows a χ²(q−1) distribution, where q is the number of instruments.

Computation: N × R² from regressing 2SLS residuals on all instruments and covariates.

With dummy instruments, the over-ID test is equivalent to a chi-square goodness-of-fit test for the VIV plot: does a straight line fit the group means well?

Caveat: Over-ID tests have limited practical value.

When IV estimates are imprecise, the test has low power (can't reject even if instruments are bad)
When IV estimates are precise, rejection may reflect treatment effect heterogeneity, not invalid instruments

4.3 Two-Sample IV and Split-Sample IV

Two-Sample IV (TSIV)

IV can be constructed from sample moments alone. The first-stage and reduced-form data need not come from the same dataset, as long as both are drawn from the same population.

When is TSIV useful? When no single dataset contains all needed variables. For example:

Data set 1 (SSA records): earnings + draft lottery numbers → reduced form
Data set 2 (military records): veteran status + lottery numbers → first stage

Split-Sample IV (SSIV)

Angrist & Krueger (1995) proposed a computationally simple TSIV estimator:

Estimate the first stage in data set 2: get π̂ from (Z₂'Z₂)⁻¹Z₂'W₂
Construct cross-sample fitted values: Ŵ₁₂ = Z₁π̂
Regress y₁ on Ŵ₁₂ in data set 1

SSIV can also help reduce bias in over-identified models (discussed in Part 3).

Part 1 Summary

Concept	Key Point
IV Estimand	ρ = Cov(y, z) / Cov(s, z) = Reduced form ÷ First stage
Exclusion Restriction	z affects y only through its effect on s
2SLS	Replace endogenous variable with first-stage fitted values
Wald Estimator	Difference in outcome means ÷ Difference in treatment means (binary z)
Grouped Data = 2SLS	GLS on group means with dummy instruments equals 2SLS
Over-ID Test	Tests if all instruments produce the same estimate; limited practical value
TSIV / SSIV	First stage and reduced form can come from different datasets

The IV recipe:

Find an instrument that is (a) correlated with the treatment, and (b) uncorrelated with the error
Estimate the first stage — if it's weak, worry (more in Part 3)
Look at the reduced form — this is the causal effect of the instrument, always unbiased
Compute IV = reduced form ÷ first stage