Applied guide with Angrist & Evans (1998) case study — Companion to MHE Chapter 4
Treatment Effects: ATE, ATT, ITT, LATE
한국어Core Message
Not "what is the effect?" but "the effect for whom?" — The same treatment can yield different estimates (ATE, ATT, ITT, LATE) depending on the target population. Understanding which estimand your method identifies is essential for correct interpretation and policy design.
1. Treatment Effect Estimands
ATE Average Treatment Effect
The average causal effect across the entire population.
- Compares: everyone treated vs. everyone untreated
- Relevant when: considering universal policy (e.g., mandatory program for all)
- Challenge: counterfactual is never observed → requires strong assumptions or perfect RCT
ATT Average Treatment Effect on the Treated
The average causal effect among those who actually received treatment.
- Compares: treated group's actual outcome vs. what they would have experienced without treatment
- Relevant when: evaluating a voluntary program for its participants
- Typically ATT > ATE when high-benefit individuals self-select into treatment
ITT Intent-to-Treat
The effect of being assigned to treatment, regardless of actual take-up.
- Z is assignment, D is actual treatment receipt
- Always unbiased — preserves randomization even with non-compliance
- Reflects the realistic effect of offering a program (including non-participation)
- |ITT| ≤ |LATE| because ITT = LATE × compliance rate
LATE Local Average Treatment Effect
The average causal effect for compliers — those whose treatment status is changed by the instrument.
- Only for compliers — excludes always-takers and never-takers
- Requires monotonicity assumption (no defiers)
- Different instruments → different compliers → different LATEs
- RDD estimates are also interpretable as LATE at the cutoff
Summary Comparison
| Estimand | Target Population | Primary Context | Method |
|---|---|---|---|
| ATE | Entire population | Universal policy effect | RCT (full compliance) |
| ATT | Treated group | Voluntary program evaluation | DID, Matching/PSM |
| ITT | Assigned group | RCT with non-compliance | Reduced form |
| LATE | Compliers | IV / RDD estimation | 2SLS, Wald estimator |
2. Case Study: Angrist & Evans (1998)
Research question: Does having a third child causally reduce female labor supply?
The Identification Problem
Simple OLS comparison of mothers with 2 vs. 3+ children confounds causation with selection: women who have more children may have inherently stronger family-orientation preferences, leading to both more children and less labor supply.
Two Instruments for a Third Child
Among mothers with ≥2 children, Angrist & Evans use two sources of exogenous variation in the probability of having a third child:
| Twins at second birth | Same-sex (first two children) | |
|---|---|---|
| Logic | Twins mechanically create ≥3 children | Parents prefer a mixed-sex sibship → more likely to try for a third |
| First stage | 0.625 (very strong) | 0.067 (modest) |
| Validity | Twin births are essentially random | Child sex composition is random |
Results
| Outcome | OLS | Twins IV | Same-sex IV |
|---|---|---|---|
| Employment | −0.167 | −0.083 | −0.135 |
| Weeks worked | −8.05 | −3.83 | −6.23 |
Why Estimates Differ: Different Compliers
Each instrument identifies effects for a different complier subpopulation:
| Characteristic | Sample Mean | Twins Ratio | Same-sex Ratio |
|---|---|---|---|
| Age ≥ 30 at first birth | 0.003 | 1.39 (overrepresented) | 1.00 (average) |
| College graduate | 0.132 | 1.14 (overrepresented) | 0.70 (underrepresented) |
Ratio > 1 means the characteristic is overrepresented among compliers relative to the population.
Twins compliers = mothers who would not have had a third child without twins
- Older, more educated, established careers
- Planned for 2 children → forced into 3 by twins
- → Labor supply impact is smaller (career attachment buffers the shock)
Same-sex compliers = mothers who had a third child due to sex-mix preference
- Younger, less educated, early career stage
- Strong family composition preferences
- → Labor supply impact is larger (less career attachment, higher opportunity cost)
Mapping to Treatment Effect Concepts
| Estimand | Interpretation in This Study | Value / Status |
|---|---|---|
| ATE | Effect of 3rd child on all mothers with 2 children | Not directly observed; somewhere between the two LATEs |
| ATT | Effect on mothers who actually had a 3rd child | OLS (−0.167) tries to estimate this but is biased by selection |
| ITT | Effect of being "assigned" twins / same-sex | Reduced form: e.g., twins RF on employment = −0.052 |
| LATE | Effect for mothers pushed into 3rd child by the instrument | Twins: −0.083 | Same-sex: −0.135 |
Lessons from This Study
- LATE ≠ ATE ≠ ATT. OLS (−0.167), Twins IV (−0.083), Same-sex IV (−0.135) all give different numbers for the same research question.
- Different instruments → different compliers → different LATEs. The choice of instrument determines whose effect you estimate.
- Complier characteristics explain the gap. The difference is systematic, not random — it traces back to the demographics of each complier group.
- Policy implications change. −8% vs. −17% employment effects lead to completely different childcare policy conclusions.
3. Mathematical Relationships
Population Subgroups Under Monotonicity
The instrument partitions the population into three groups (assuming no defiers):
| Group | Definition | Share |
|---|---|---|
| Compliers (C) | d1i = 1, d0i = 0 | πC = E[D|Z=1] − E[D|Z=0] = First stage |
| Always-takers (AT) | d1i = d0i = 1 | πAT = E[D|Z=0] |
| Never-takers (NT) | d1i = d0i = 0 | πNT = 1 − E[D|Z=1] |
Decomposition of Each Estimand
ATE: Weighted average across all groups
ATT: Compliers + Always-takers
Treated = compliers + always-takers. Never-takers are excluded (they don't get treated).
ITT: LATE × Compliance rate
Always unbiased (OLS of Y on Z). Smaller than LATE in magnitude because compliance rate < 1.
LATE: Compliers only
Excludes always-takers and never-takers entirely.
Special Case: LATE = ATT (Bloom 1984)
When there are no always-takers (one-sided non-compliance), i.e., E[D|Z=0] = 0:
Example: JTPA training experiment — you can't access training without assignment, so everyone who trained was a complier. IV = ITT ÷ compliance rate = ATT.
Size Relationships
| Relationship | Condition | Example |
|---|---|---|
| |ITT| < |LATE| | Always (when compliance < 1) | ITT = LATE × compliance rate |
| ATT ≥ ATE (typically) | High-benefit individuals self-select | Voluntary job training, college |
| LATE = ATT | No always-takers | JTPA experiment (Bloom 1984) |
| LATE₁ ≠ LATE₂ | Different IVs → different compliers | Angrist & Evans: Twins ≠ Same-sex |
| LATE = ATE | Homogeneous treatment effects | Constant effect for everyone |
Methodology → Estimand Connection
| Method | Estimates | Generalizability |
|---|---|---|
| RCT (full compliance) | ATE | Broad |
| RCT (non-compliance) + IV | LATE | Compliers only |
| DID | ATT | Groups similar to treated |
| RDD | LATE at cutoff | Near cutoff only |
| Matching / PSM | ATT | Groups similar to treated |
Takeaway
When reading or writing empirical research, always ask:
- What estimand does this method identify? (ATE, ATT, or LATE?)
- Who are the compliers? (If IV/RDD — whose effect are we learning about?)
- Does the estimand match the policy question? (Universal program → ATE; voluntary → ATT; nudge → LATE)
- Are the compliers relevant for the intended policy? (Pilot enthusiasts ≠ general population)
Suhyeon Lee