Suhyeon Lee - Digital Marketing Researcher

Chapter 7: Quantile Regression Study Guide


Key Concepts Overview

Why Quantile Regression?

“95% of applied econometrics is concerned with averages. But what if we want to know what’s happening to the entire distribution?”

Limitations of OLS:

Advantages of Quantile Regression:


7.1 Quantile Regression Model

Conditional Quantile Function (CQF)

Definition:

Q_τ(y_i | X_i) = F_Y^{-1}(τ | X_i)
τ value Meaning
τ = 0.10 Lower decile
τ = 0.50 Median
τ = 0.90 Upper decile

CEF vs CQF Comparison

  CEF (OLS) CQF (Quantile Reg)
Definition E[y_i | X_i] Q_τ(y_i | X_i)
Minimization E[(y_i - m(X_i))²] E[ρ_τ(y_i - q(X_i))]
Loss function Squared error Check function ρ_τ
Estimates Conditional mean Conditional quantile

Check Function (ρ_τ)

ρ_τ(u) = u · (τ - 1(u ≤ 0))
       = τ·u        if u > 0
       = (τ-1)·u    if u ≤ 0

Intuition: Asymmetric weighting on positive/negative residuals

τ value Weight on positive Weight on negative Result
0.5 0.5 0.5 Median (LAD)
0.9 0.9 0.1 Upper quantile
0.1 0.1 0.9 Lower quantile

Quantile Regression Estimation

Population problem:

β_τ = arg min_{b∈R^d} E[ρ_τ(y_i - X_i'b)]

Sample analog: Solvable via linear programming

Linear model assumption:

Q_τ(y_i | X_i) = X_i'β_τ

Location Shift vs Heteroskedasticity

Case 1: Location Shift (Homoskedastic)

Model:

y_i ~ N(X_i'β, σ²)

CQF derivation:

P[y_i - X_i'β < σ·Φ^{-1}(τ) | X_i] = τ

Therefore:

Q_τ(y_i | X_i) = X_i'β + σ·Φ^{-1}(τ)

Characteristics:

Case 2: Heteroskedasticity (Linear Location-Scale Model)

Model:

y_i ~ N(X_i'β, (X_i'γ)²)

where γ > 0, X_i’γ > 0

CQF derivation:

P[y_i - X_i'β < (X_i'γ)·Φ^{-1}(τ) | X_i] = τ

Therefore:

Q_τ(y_i | X_i) = X_i'β + (X_i'γ)·Φ^{-1}(τ)
                = X_i'[β + γ·Φ^{-1}(τ)]

Characteristics:


Empirical Example: Returns to Education (Table 7.1.1)

Data

Quantile Regression Results by Year

Census Mean SD 0.10 0.25 0.50 0.75 0.90 OLS
1980 6.40 0.67 .074 .074 .068 .070 .079 .072
1990 6.46 0.60 .112 .110 .106 .111 .137 .114
2000 6.50 0.75 .092 .105 .111 .120 .157 .114

Interpretation

1980:

1990:

2000:

Policy implications:


7.1.1 Censored Quantile Regression

Problem Setting

Censoring: Some data is hidden (not the same as limited!)

y_{i,obs} = y_i · 1[y_i < c]
Type Example Description
Top-coding CPS high earnings Privacy protection
Duration censoring Unemployment >40 weeks Follow-up period limit

Note: Different from limited dependent variables (e.g., medical expenditure = 0)!

Solution: Powell (1986)

Key insight: Quantiles below censoring point are unaffected

Example: Top 10% censored → Estimation for τ ≤ 0.90 unaffected

Censored QR model:

Q_τ(y_i | X_i) = min(c, X_i'β_τ)

Estimation:

β_τ^c = arg min_{b∈R^d} E{1[X_i'β_τ^c < c] · ρ_τ(y_i - X_i'b)}

Only use observations where X_i’β < c

Buchinsky (1994) Iterative Algorithm

Problem: Don’t know which observations have X_i’β < c a priori

Solution: Iterative estimation

Step 1: Estimate β̂_τ ignoring censoring
Step 2: Find cells with X_i'β̂_τ < c
Step 3: Re-estimate β̂_τ using only those cells
Step 4: Repeat until convergence

Features:


7.1.2 Quantile Regression Approximation Property

Theorem 7.1.1 (Angrist, Chernozhukov, Fernandez-Val 2006)

Assumptions:

Theorem:

β_τ = arg min_{b∈R^d} E[w_τ(X_i, b) · ε_τ²(X_i, b)]

where:

Approximation:

w_τ(X_i, β_τ) ≈ (1/2) · f_Y(Q_τ(y_i|X_i) | X_i)

Intuitive Meaning

  OLS Quantile Regression
Approximates E[y_i | X_i] Q_τ(y_i | X_i)
Weights Histogram of X_i w_τ(X_i) × Histogram
Emphasizes Entire X_i distribution X_i values where y_i is dense near CQF

Key point: Even if CQF is not exactly linear, quantile regression provides the best linear approximation in a weighted least squares sense


Figure 7.1.1: QR vs MD vs Nonparametric CQF

Comparison of Three Estimators

1980 Census data, dependent variable: log wages, independent variable: schooling

Estimator Method Features
CQ (Nonparametric) Direct quantile computation at each schooling level Nonparametric, requires large samples
QR (Quantile Regression) ρ_τ minimization Linear model assumption, weighted fit
MD (Minimum Distance) Linear regression on CQF Chamberlain (1994), histogram weighting

Minimum Distance (MD) Estimator

Chamberlain (1994):

β̃_τ = arg min_{b∈R^d} E[(Q_τ(y_i|X_i) - X_i'b)²]
Interpretation: Regress Q_τ(y_i X_i) on X_i → uses histogram weights

QR vs MD difference:

Figure Interpretation (Panels A-C)

Panel A (τ = 0.10):          Panel B (τ = 0.50):          Panel C (τ = 0.90):
                              
    y│                            y│                            y│
     │    ○○○○                     │      ○○○○                   │        ○○○○
     │  ○○  ──QR                   │    ○○  ──QR                 │      ○○  ──QR
     │○○   ---MD                   │  ○○   ---MD                 │    ○○   ---MD
     │    ○=CQ                     │○○    ○=CQ                   │  ○○    ○=CQ
     └──────────── school          └──────────── school          └──────────── school

Figure Interpretation (Panels D-F): Weighting Functions

Panel D-F: Weighting functions by schooling

  Weight│
   0.5 │     
       │        ●  ●           ← QR weights (overall)
   0.4 │       ● ●  ●
       │      ●      ●
   0.3 │     ●        ●
       │    ●          ●
   0.2 │   ●            ●
       │  ●              ●
   0.1 │ ●                ●
       │●                  ●
     0 └────────────────────── school
         8  10  12  14  16  18

Observations:


7.1.3 Tricky Points

Tricky Point 1: Individual Effects vs Distributional Effects

“Training raised the lower decile” ≠ “Poor people became richer”

What quantile regression tells us:

What it doesn’t tell us:

Mathematical explanation:

Rank Preservation assumption:

Tricky Point 2: Conditional vs Marginal Quantiles

Problem: Conditional quantile ≠ Marginal quantile

For expectations (simple):

E[y_i | X_i] = X_i'β  
⟹  E[y_i] = E[X_i]'β   (by iterated expectations)

For quantiles (complex):

Q_τ(y_i | X_i) = X_i'β_τ  
⟹  Q_τ(y_i) ≠ E[X_i]'β_τ   (in general!)

Why? Quantiles are nonlinear operators

Extracting Marginal Quantiles: Detailed Procedure

Step 1: Relationship between conditional quantiles and conditional distribution

∫_0^1 1[F_Y^{-1}(τ|X_i) < y] dτ = F_Y(y|X_i)

Interpretation: Proportion of conditional quantiles below y = conditional CDF

Step 2: Substitute linear CQF

F_Y(y|X_i) = ∫_0^1 1[X_i'β_τ < y] dτ

Step 3: Integrate over X_i → Marginal CDF

F_Y(y) = ∫∫_0^1 1[X_i'β_τ < y] dτ dF_X(x)

Step 4: Marginal quantile = inverse of F_Y(y)

Q_τ(y_i) = inf{y : F_Y(y) ≥ τ}

Practical Estimation (Machado & Mata 2005)

Sample analog:

F̂_Y(y) = (1/n) Σ_i (1/100) Σ_{τ=0.01}^{1.00} 1[X_i'β̂_τ < y]

Procedure:

  1. Estimate 100 quantile regressions at τ = 0.01, 0.02, …, 0.99
  2. Compute 100 predicted values for each X_i
  3. Compute empirical distribution of all predicted values
  4. Extract marginal quantiles from this distribution

Limitations:


7.2 Quantile Treatment Effects (QTE)

Problem: Selection Bias in Quantile Regression

Quantile regression also suffers from omitted variable bias

Method Estimand Selection bias
OLS Average effect Present
Quantile Reg Quantile effect Present
2SLS Average causal effect Removed
QTE Quantile causal effect Removed

QTE Idea

Extend LATE framework to quantiles (Abadie, Angrist, Imbens 2002)

Model:

Q_τ(y_i | X_i, d_i, complier) = α_τ·d_i + X_i'β_τ

Interpretation:

Important Distinction

What α_τ means:

What α_τ does NOT mean:

Good news:

QTE Estimator: Abadie Kappa

Kappa definition:

κ_i = 1 - d_i(1-z_i)/(1-P(z_i=1|X_i)) - (1-d_i)z_i/P(z_i=1|X_i)
Property: E[κ_i complier] = 1, E[κ_i non-complier] = 0

QTE estimation:

(α_τ, β_τ) = arg min_{a,b} E[κ_i · ρ_τ(y_i - a·d_i - X_i'b)]

Practical Issues and Solutions for QTE Implementation

Problem 1: κ_i can be negative

When d_i ≠ z_i, κ_i < 0 → minimand is non-convex → LP not applicable

Solution: Use iterated expectations

E[κ_i · ρ_τ(...)] = E[E[κ_i | y_i, d_i, X_i] · ρ_τ(...)]

where:

E[κ_i | y_i, d_i, X_i] = P[complier | y_i, d_i, X_i] ∈ [0, 1]

Problem 2: Need to estimate E[κ_i | y_i, d_i, X_i]

Formula:

E[κ_i | y_i, d_i, X_i] = 1 - d_i(1-E[z_i|y_i,d_i=1,X_i])/(1-P(z_i=1|X_i)) 
                          - (1-d_i)E[z_i|y_i,d_i=0,X_i]/P(z_i=1|X_i)

QTE Implementation Steps (Angrist 2001)

Step 1: In d_i = 1 subsample, Probit: z_i ~ y_i, X_i
        → Save Ê[z_i | y_i, d_i=1, X_i]

Step 2: In d_i = 0 subsample, Probit: z_i ~ y_i, X_i
        → Save Ê[z_i | y_i, d_i=0, X_i]

Step 3: In full sample, Probit: z_i ~ X_i
        → Save P̂(z_i=1 | X_i)

Step 4: Compute Ê[κ_i | y_i, d_i, X_i] using formula
        - Trim to [0, 1] range

Step 5: Use as weights in Stata qreg for quantile regression

Step 6: Bootstrap entire procedure → standard errors

Standard Errors for Quantile Regression

Conventional Standard Errors (Stata qreg, robust)

Assumption: CQF is exactly linear

Formula:

Var(β̂_τ) = τ(1-τ) · {E[f_u(0|X_i)X_i X_i']}^{-1} · E[X_i X_i'] · {E[f_u(0|X_i)X_i X_i']}^{-1}
where f_u(0 X_i) = conditional density of residual at 0

Homoskedastic case:

Var(β̂_τ) = τ(1-τ)/f_u²(0) · {E[X_i X_i']}^{-1}

Robust Standard Errors (Angrist, Chernozhukov, Fernandez-Val 2006)

Bootstrap


Empirical Example: JTPA Training Program (Table 7.2.1)

Background

Item Description
Program Job Training Partnership Act (1980s US)
Target Disadvantaged workers
SDAs 649 Service Delivery Areas
Sample 15,981 individuals (30-month earnings data)

Variable Definitions

Variable Definition
y_i 30-month cumulative earnings
d_i Actual training participation
z_i Training offer (randomly assigned)
X_i Race, education, marriage, age, prior work, etc.

Compliance Situation

Results Comparison: Panel A (OLS & QR)

Selection bias present

  OLS τ=0.15 τ=0.25 τ=0.50 τ=0.75 τ=0.85
Training 3,754 1,187 2,510 4,420 4,678 4,806
(s.e.) (536) (205) (356) (651) (937) (1,055)
% Impact 21% 136% 75% 35% 17% 13%

Observation: Effect appears much larger at lower quantiles (136% vs 13%)

Results Comparison: Panel B (2SLS & QTE)

Selection bias removed

  2SLS τ=0.15 τ=0.25 τ=0.50 τ=0.75 τ=0.85
Training 1,593 121 702 1,544 3,131 3,378
(s.e.) (895) (475) (670) (1,073) (1,376) (1,811)
% Impact 9% 5% 12% 10% 11% 9%

Observation: Lower quantile effects nearly disappear!

QR vs QTE Comparison

Quantile QR estimate QTE estimate Difference Interpretation
0.15 $1,187 $121 −90% Severe selection bias
0.25 $2,510 $702 −72% Severe selection bias
0.50 $4,420 $1,544 −65% Moderate
0.75 $4,678 $3,131 −33% Less severe
0.85 $4,806 $3,378 −30% Less severe

Key Findings

Selection Bias Pattern:

Interpretation:

Policy Implications:


Three Key Questions

Q1. Quantile Regression vs OLS

What distinguishes quantile regression from OLS, and when should it be used?

Answer:

Aspect OLS Quantile Regression
Estimand Conditional mean Conditional quantile
Loss function Squared error Check function (asymmetric)
Distribution info Mean only Entire distribution
Outlier sensitivity High Low (especially median)

When to use:


Q2. Location Shift vs Fanning Out

If coefficients are identical across quantiles, it’s a location shift. What does it mean when they differ?

Answer:

Pattern Meaning Mathematical condition Example
Identical coefficients Location shift Homoskedastic 1980 education effect
Increasing coefficients Inequality increase (fanning out) Var(y|X) increases 2000 education effect
Decreasing coefficients Inequality decrease (compression) Var(y|X) decreases -

2000 interpretation:


Q3. Need for QTE

Explain why quantile regression estimates can have selection bias and how QTE addresses this.

Answer:

Selection bias example (JTPA):

How QTE solves this:

  1. Apply IV logic to quantile regression
  2. Weight by Abadie Kappa for compliers
  3. Use random assignment (z_i) as instrument
  4. Kappa-weighted quantile regression

JTPA result:


Discussion Questions

D1. Individual Effects vs Distributional Effects

Does “training raised the lower decile by $1,000” mean “poor people earned $1,000 more”?

Discussion points:

D2. Conditional vs Marginal Quantiles

If education effects differ by quantile, what does this imply for overall inequality?

Discussion points:

D3. Limitations of QTE

QTE only estimates effects for compliers, like LATE. What limitations does this impose on policy implications?

Discussion points:


Practical Checklist

When Running Quantile Regression

□ Estimate multiple quantiles (0.1, 0.25, 0.5, 0.75, 0.9)
□ Check how coefficients vary across quantiles
□ Determine location shift vs fanning out
□ Check for censoring issues (topcode, etc.)
□ Report standard errors (bootstrap recommended)
□ Verify median ≈ OLS (for symmetric distributions)

When Running QTE

□ Verify instrument validity (LATE assumptions)
□ Check first stage strength
□ Estimate E[z|y,d,X] via Probit (separately for d=0, d=1)
□ Estimate P(z=1|X) via Probit
□ Compute kappa weights and trim to [0,1]
□ Run kappa-weighted quantile regression
□ Compare with QR estimates (assess selection bias magnitude)
□ Bootstrap standard errors

Stata Code Example

* Quantile Regression
qreg y x1 x2, quantile(0.5) vce(robust)

* Multiple quantiles
foreach q in 0.1 0.25 0.5 0.75 0.9 {
    qreg y x1 x2, quantile(`q')
}

* QTE (simplified)
* Steps 1-3: Probit
probit z y x1 x2 if d==1
predict pz_d1
probit z y x1 x2 if d==0  
predict pz_d0
probit z x1 x2
predict pz

* Step 4: Kappa
gen kappa = 1 - d*(1-pz_d1)/(1-pz) - (1-d)*pz_d0/pz
replace kappa = 0 if kappa < 0
replace kappa = 1 if kappa > 1

* Step 5: Weighted QR
qreg y d x1 x2 [pw=kappa], quantile(0.5)

Key Formulas Summary

Check Function

ρ_τ(u) = u·(τ - 1(u ≤ 0))

Quantile Regression

β_τ = arg min E[ρ_τ(y_i - X_i'b)]

Location Shift Model

Q_τ(y_i | X_i) = X_i'β + σ·Φ^{-1}(τ)

Heteroskedastic Model (Location-Scale)

Q_τ(y_i | X_i) = X_i'[β + γ·Φ^{-1}(τ)]

Abadie Kappa

κ_i = 1 - d_i(1-z_i)/(1-p(X_i)) - (1-d_i)z_i/p(X_i)

where p(X_i) = P(z_i=1|X_i)

QTE Estimator

(α_τ, β_τ) = arg min E[κ_i · ρ_τ(y_i - α·d_i - X_i'b)]

Conditional → Marginal Quantile

F_Y(y) = ∫∫_0^1 1[X_i'β_τ < y] dτ dF_X(x)
Q_τ(y) = F_Y^{-1}(τ)

Comprehensive Comparison: OLS vs QR vs 2SLS vs QTE

2×2 Framework

  Exogenous d_i Endogenous d_i
Mean OLS 2SLS
Quantile Quantile Regression QTE

Detailed Comparison

Method Estimand Selection bias Distribution info Assumption
OLS E[y|X,d] Present Mean only Linear CEF
2SLS E[y|X,d] for compliers Removed Mean only LATE assumptions
QR Q_τ(y|X,d) Present Entire distribution Linear CQF
QTE Q_τ(y|X,d) for compliers Removed Entire distribution LATE + Linear CQF

Based on Angrist & Pischke, “Mostly Harmless Econometrics” Chapter 7