“95% of applied econometrics is concerned with averages. But what if we want to know what’s happening to the entire distribution?”
Limitations of OLS:
Advantages of Quantile Regression:
Definition:
Q_τ(y_i | X_i) = F_Y^{-1}(τ | X_i)
| τ value | Meaning |
|---|---|
| τ = 0.10 | Lower decile |
| τ = 0.50 | Median |
| τ = 0.90 | Upper decile |
| CEF (OLS) | CQF (Quantile Reg) | |
|---|---|---|
| Definition | E[y_i | X_i] | Q_τ(y_i | X_i) |
| Minimization | E[(y_i - m(X_i))²] | E[ρ_τ(y_i - q(X_i))] |
| Loss function | Squared error | Check function ρ_τ |
| Estimates | Conditional mean | Conditional quantile |
ρ_τ(u) = u · (τ - 1(u ≤ 0))
= τ·u if u > 0
= (τ-1)·u if u ≤ 0
Intuition: Asymmetric weighting on positive/negative residuals
| τ value | Weight on positive | Weight on negative | Result |
|---|---|---|---|
| 0.5 | 0.5 | 0.5 | Median (LAD) |
| 0.9 | 0.9 | 0.1 | Upper quantile |
| 0.1 | 0.1 | 0.9 | Lower quantile |
Population problem:
β_τ = arg min_{b∈R^d} E[ρ_τ(y_i - X_i'b)]
Sample analog: Solvable via linear programming
Linear model assumption:
Q_τ(y_i | X_i) = X_i'β_τ
Model:
y_i ~ N(X_i'β, σ²)
CQF derivation:
P[y_i - X_i'β < σ·Φ^{-1}(τ) | X_i] = τ
Therefore:
Q_τ(y_i | X_i) = X_i'β + σ·Φ^{-1}(τ)
Characteristics:
Model:
y_i ~ N(X_i'β, (X_i'γ)²)
where γ > 0, X_i’γ > 0
CQF derivation:
P[y_i - X_i'β < (X_i'γ)·Φ^{-1}(τ) | X_i] = τ
Therefore:
Q_τ(y_i | X_i) = X_i'β + (X_i'γ)·Φ^{-1}(τ)
= X_i'[β + γ·Φ^{-1}(τ)]
Characteristics:
| Census | Mean | SD | 0.10 | 0.25 | 0.50 | 0.75 | 0.90 | OLS |
|---|---|---|---|---|---|---|---|---|
| 1980 | 6.40 | 0.67 | .074 | .074 | .068 | .070 | .079 | .072 |
| 1990 | 6.46 | 0.60 | .112 | .110 | .106 | .111 | .137 | .114 |
| 2000 | 6.50 | 0.75 | .092 | .105 | .111 | .120 | .157 | .114 |
1980:
1990:
2000:
Policy implications:
Censoring: Some data is hidden (not the same as limited!)
y_{i,obs} = y_i · 1[y_i < c]
| Type | Example | Description |
|---|---|---|
| Top-coding | CPS high earnings | Privacy protection |
| Duration censoring | Unemployment >40 weeks | Follow-up period limit |
Note: Different from limited dependent variables (e.g., medical expenditure = 0)!
Key insight: Quantiles below censoring point are unaffected
Example: Top 10% censored → Estimation for τ ≤ 0.90 unaffected
Censored QR model:
Q_τ(y_i | X_i) = min(c, X_i'β_τ)
Estimation:
β_τ^c = arg min_{b∈R^d} E{1[X_i'β_τ^c < c] · ρ_τ(y_i - X_i'b)}
Only use observations where X_i’β < c
Problem: Don’t know which observations have X_i’β < c a priori
Solution: Iterative estimation
Step 1: Estimate β̂_τ ignoring censoring
Step 2: Find cells with X_i'β̂_τ < c
Step 3: Re-estimate β̂_τ using only those cells
Step 4: Repeat until convergence
Features:
Assumptions:
| (i) Conditional density f_Y(y | X_i) exists a.s. |
| (ii) E[y_i], E[Q_τ(y_i | X_i)], E | X_i | are finite |
Theorem:
β_τ = arg min_{b∈R^d} E[w_τ(X_i, b) · ε_τ²(X_i, b)]
where:
| Specification error: ε_τ(X_i, β) ≡ X_i’β_τ - Q_τ(y_i | X_i) |
w_τ(X_i, β) = ∫_0^1 (1-u) · f_{ε(τ)}(u·ε_τ(X_i,β) | X_i) du
Approximation:
w_τ(X_i, β_τ) ≈ (1/2) · f_Y(Q_τ(y_i|X_i) | X_i)
| OLS | Quantile Regression | |
|---|---|---|
| Approximates | E[y_i | X_i] | Q_τ(y_i | X_i) |
| Weights | Histogram of X_i | w_τ(X_i) × Histogram |
| Emphasizes | Entire X_i distribution | X_i values where y_i is dense near CQF |
Key point: Even if CQF is not exactly linear, quantile regression provides the best linear approximation in a weighted least squares sense
1980 Census data, dependent variable: log wages, independent variable: schooling
| Estimator | Method | Features |
|---|---|---|
| CQ (Nonparametric) | Direct quantile computation at each schooling level | Nonparametric, requires large samples |
| QR (Quantile Regression) | ρ_τ minimization | Linear model assumption, weighted fit |
| MD (Minimum Distance) | Linear regression on CQF | Chamberlain (1994), histogram weighting |
Chamberlain (1994):
β̃_τ = arg min_{b∈R^d} E[(Q_τ(y_i|X_i) - X_i'b)²]
| Interpretation: Regress Q_τ(y_i | X_i) on X_i → uses histogram weights |
QR vs MD difference:
Panel A (τ = 0.10): Panel B (τ = 0.50): Panel C (τ = 0.90):
y│ y│ y│
│ ○○○○ │ ○○○○ │ ○○○○
│ ○○ ──QR │ ○○ ──QR │ ○○ ──QR
│○○ ---MD │ ○○ ---MD │ ○○ ---MD
│ ○=CQ │○○ ○=CQ │ ○○ ○=CQ
└──────────── school └──────────── school └──────────── school
Panel D-F: Weighting functions by schooling
Weight│
0.5 │
│ ● ● ← QR weights (overall)
0.4 │ ● ● ●
│ ● ●
0.3 │ ● ●
│ ● ●
0.2 │ ● ●
│ ● ●
0.1 │ ● ●
│● ●
0 └────────────────────── school
8 10 12 14 16 18
Observations:
“Training raised the lower decile” ≠ “Poor people became richer”
What quantile regression tells us:
What it doesn’t tell us:
Mathematical explanation:
| We compare Q_τ(y₁ᵢ | X) vs Q_τ(y₀ᵢ | X) |
Rank Preservation assumption:
Problem: Conditional quantile ≠ Marginal quantile
For expectations (simple):
E[y_i | X_i] = X_i'β
⟹ E[y_i] = E[X_i]'β (by iterated expectations)
For quantiles (complex):
Q_τ(y_i | X_i) = X_i'β_τ
⟹ Q_τ(y_i) ≠ E[X_i]'β_τ (in general!)
Why? Quantiles are nonlinear operators
Step 1: Relationship between conditional quantiles and conditional distribution
∫_0^1 1[F_Y^{-1}(τ|X_i) < y] dτ = F_Y(y|X_i)
Interpretation: Proportion of conditional quantiles below y = conditional CDF
Step 2: Substitute linear CQF
F_Y(y|X_i) = ∫_0^1 1[X_i'β_τ < y] dτ
Step 3: Integrate over X_i → Marginal CDF
F_Y(y) = ∫∫_0^1 1[X_i'β_τ < y] dτ dF_X(x)
Step 4: Marginal quantile = inverse of F_Y(y)
Q_τ(y_i) = inf{y : F_Y(y) ≥ τ}
Sample analog:
F̂_Y(y) = (1/n) Σ_i (1/100) Σ_{τ=0.01}^{1.00} 1[X_i'β̂_τ < y]
Procedure:
Limitations:
Quantile regression also suffers from omitted variable bias
| Method | Estimand | Selection bias |
|---|---|---|
| OLS | Average effect | Present |
| Quantile Reg | Quantile effect | Present |
| 2SLS | Average causal effect | Removed |
| QTE | Quantile causal effect | Removed |
Extend LATE framework to quantiles (Abadie, Angrist, Imbens 2002)
Model:
Q_τ(y_i | X_i, d_i, complier) = α_τ·d_i + X_i'β_τ
Interpretation:
| i.e.: Q_τ(y₁ᵢ | X_i, complier) - Q_τ(y₀ᵢ | X_i, complier) = α_τ |
What α_τ means:
What α_τ does NOT mean:
Good news:
Kappa definition:
κ_i = 1 - d_i(1-z_i)/(1-P(z_i=1|X_i)) - (1-d_i)z_i/P(z_i=1|X_i)
| Property: E[κ_i | complier] = 1, E[κ_i | non-complier] = 0 |
QTE estimation:
(α_τ, β_τ) = arg min_{a,b} E[κ_i · ρ_τ(y_i - a·d_i - X_i'b)]
When d_i ≠ z_i, κ_i < 0 → minimand is non-convex → LP not applicable
Solution: Use iterated expectations
E[κ_i · ρ_τ(...)] = E[E[κ_i | y_i, d_i, X_i] · ρ_τ(...)]
where:
E[κ_i | y_i, d_i, X_i] = P[complier | y_i, d_i, X_i] ∈ [0, 1]
Formula:
E[κ_i | y_i, d_i, X_i] = 1 - d_i(1-E[z_i|y_i,d_i=1,X_i])/(1-P(z_i=1|X_i))
- (1-d_i)E[z_i|y_i,d_i=0,X_i]/P(z_i=1|X_i)
Step 1: In d_i = 1 subsample, Probit: z_i ~ y_i, X_i
→ Save Ê[z_i | y_i, d_i=1, X_i]
Step 2: In d_i = 0 subsample, Probit: z_i ~ y_i, X_i
→ Save Ê[z_i | y_i, d_i=0, X_i]
Step 3: In full sample, Probit: z_i ~ X_i
→ Save P̂(z_i=1 | X_i)
Step 4: Compute Ê[κ_i | y_i, d_i, X_i] using formula
- Trim to [0, 1] range
Step 5: Use as weights in Stata qreg for quantile regression
Step 6: Bootstrap entire procedure → standard errors
Assumption: CQF is exactly linear
Formula:
Var(β̂_τ) = τ(1-τ) · {E[f_u(0|X_i)X_i X_i']}^{-1} · E[X_i X_i'] · {E[f_u(0|X_i)X_i X_i']}^{-1}
| where f_u(0 | X_i) = conditional density of residual at 0 |
Homoskedastic case:
Var(β̂_τ) = τ(1-τ)/f_u²(0) · {E[X_i X_i']}^{-1}
| Item | Description |
|---|---|
| Program | Job Training Partnership Act (1980s US) |
| Target | Disadvantaged workers |
| SDAs | 649 Service Delivery Areas |
| Sample | 15,981 individuals (30-month earnings data) |
| Variable | Definition |
|---|---|
| y_i | 30-month cumulative earnings |
| d_i | Actual training participation |
| z_i | Training offer (randomly assigned) |
| X_i | Race, education, marriage, age, prior work, etc. |
Selection bias present
| OLS | τ=0.15 | τ=0.25 | τ=0.50 | τ=0.75 | τ=0.85 | |
|---|---|---|---|---|---|---|
| Training | 3,754 | 1,187 | 2,510 | 4,420 | 4,678 | 4,806 |
| (s.e.) | (536) | (205) | (356) | (651) | (937) | (1,055) |
| % Impact | 21% | 136% | 75% | 35% | 17% | 13% |
Observation: Effect appears much larger at lower quantiles (136% vs 13%)
Selection bias removed
| 2SLS | τ=0.15 | τ=0.25 | τ=0.50 | τ=0.75 | τ=0.85 | |
|---|---|---|---|---|---|---|
| Training | 1,593 | 121 | 702 | 1,544 | 3,131 | 3,378 |
| (s.e.) | (895) | (475) | (670) | (1,073) | (1,376) | (1,811) |
| % Impact | 9% | 5% | 12% | 10% | 11% | 9% |
Observation: Lower quantile effects nearly disappear!
| Quantile | QR estimate | QTE estimate | Difference | Interpretation |
|---|---|---|---|---|
| 0.15 | $1,187 | $121 | −90% | Severe selection bias |
| 0.25 | $2,510 | $702 | −72% | Severe selection bias |
| 0.50 | $4,420 | $1,544 | −65% | Moderate |
| 0.75 | $4,678 | $3,131 | −33% | Less severe |
| 0.85 | $4,806 | $3,378 | −30% | Less severe |
Selection Bias Pattern:
Interpretation:
Policy Implications:
What distinguishes quantile regression from OLS, and when should it be used?
Answer:
| Aspect | OLS | Quantile Regression |
|---|---|---|
| Estimand | Conditional mean | Conditional quantile |
| Loss function | Squared error | Check function (asymmetric) |
| Distribution info | Mean only | Entire distribution |
| Outlier sensitivity | High | Low (especially median) |
When to use:
If coefficients are identical across quantiles, it’s a location shift. What does it mean when they differ?
Answer:
| Pattern | Meaning | Mathematical condition | Example |
|---|---|---|---|
| Identical coefficients | Location shift | Homoskedastic | 1980 education effect |
| Increasing coefficients | Inequality increase (fanning out) | Var(y|X) increases | 2000 education effect |
| Decreasing coefficients | Inequality decrease (compression) | Var(y|X) decreases | - |
2000 interpretation:
Explain why quantile regression estimates can have selection bias and how QTE addresses this.
Answer:
Selection bias example (JTPA):
How QTE solves this:
JTPA result:
Does “training raised the lower decile by $1,000” mean “poor people earned $1,000 more”?
Discussion points:
If education effects differ by quantile, what does this imply for overall inequality?
Discussion points:
| Q_τ(y | X) ≠ Q_τ(y) problem |
QTE only estimates effects for compliers, like LATE. What limitations does this impose on policy implications?
Discussion points:
□ Estimate multiple quantiles (0.1, 0.25, 0.5, 0.75, 0.9)
□ Check how coefficients vary across quantiles
□ Determine location shift vs fanning out
□ Check for censoring issues (topcode, etc.)
□ Report standard errors (bootstrap recommended)
□ Verify median ≈ OLS (for symmetric distributions)
□ Verify instrument validity (LATE assumptions)
□ Check first stage strength
□ Estimate E[z|y,d,X] via Probit (separately for d=0, d=1)
□ Estimate P(z=1|X) via Probit
□ Compute kappa weights and trim to [0,1]
□ Run kappa-weighted quantile regression
□ Compare with QR estimates (assess selection bias magnitude)
□ Bootstrap standard errors
* Quantile Regression
qreg y x1 x2, quantile(0.5) vce(robust)
* Multiple quantiles
foreach q in 0.1 0.25 0.5 0.75 0.9 {
qreg y x1 x2, quantile(`q')
}
* QTE (simplified)
* Steps 1-3: Probit
probit z y x1 x2 if d==1
predict pz_d1
probit z y x1 x2 if d==0
predict pz_d0
probit z x1 x2
predict pz
* Step 4: Kappa
gen kappa = 1 - d*(1-pz_d1)/(1-pz) - (1-d)*pz_d0/pz
replace kappa = 0 if kappa < 0
replace kappa = 1 if kappa > 1
* Step 5: Weighted QR
qreg y d x1 x2 [pw=kappa], quantile(0.5)
ρ_τ(u) = u·(τ - 1(u ≤ 0))
β_τ = arg min E[ρ_τ(y_i - X_i'b)]
Q_τ(y_i | X_i) = X_i'β + σ·Φ^{-1}(τ)
Q_τ(y_i | X_i) = X_i'[β + γ·Φ^{-1}(τ)]
κ_i = 1 - d_i(1-z_i)/(1-p(X_i)) - (1-d_i)z_i/p(X_i)
where p(X_i) = P(z_i=1|X_i)
(α_τ, β_τ) = arg min E[κ_i · ρ_τ(y_i - α·d_i - X_i'b)]
F_Y(y) = ∫∫_0^1 1[X_i'β_τ < y] dτ dF_X(x)
Q_τ(y) = F_Y^{-1}(τ)
| Exogenous d_i | Endogenous d_i | |
|---|---|---|
| Mean | OLS | 2SLS |
| Quantile | Quantile Regression | QTE |
| Method | Estimand | Selection bias | Distribution info | Assumption |
|---|---|---|---|---|
| OLS | E[y|X,d] | Present | Mean only | Linear CEF |
| 2SLS | E[y|X,d] for compliers | Removed | Mean only | LATE assumptions |
| QR | Q_τ(y|X,d) | Present | Entire distribution | Linear CQF |
| QTE | Q_τ(y|X,d) for compliers | Removed | Entire distribution | LATE + Linear CQF |
Based on Angrist & Pischke, “Mostly Harmless Econometrics” Chapter 7