| Track | Requirement |
|---|---|
| Track 1 | Replicate and extend a landmark causal ML paper using original (or similar) data |
| Track 2 | Conduct a structured literature review on a specific ML-in-economics domain |
| Report | 12-15 page academic paper in LaTeX |
| Weight | 40% of final grade |
| Due Date | May 5, 2027 |
Paper: Chernozhukov et al. (2018) - Double Machine Learning
Question: What is the causal effect of 401(k) eligibility on net financial assets?
Source: Survey of Consumer Finances (SCF) 1991 (~9,915 observations)
# Download Option A: econml package
pip install econml
from econml.datasets import fetch_401k
data = fetch_401k()
# Download Option B: Federal Reserve
# https://www.federalreserve.gov/econres/scfindex.htm
import numpy as np
from sklearn.linear_model import LassoCV
from sklearn.model_selection import KFold
def dml_plr(Y, D, X, ml_g, ml_m, n_folds=5):
"""
Double Machine Learning for Partially Linear Model
Y = θ*D + g(X) + ε
TODO:
1. Split data into K folds
2. Train ml_g (outcome model) and ml_m (propensity model)
on training set, predict on test set
3. Compute residuals: Ỹ = Y - ĝ(X), D̃ = D - m̂(X)
4. Estimate θ = mean(Ỹ * D̃) / mean(D̃²)
5. Compute standard errors
Returns: theta, standard_error
"""
n = len(Y)
Y_hat = np.zeros(n)
D_hat = np.zeros(n)
kf = KFold(n_splits=n_folds, shuffle=True, random_state=42)
for train_idx, test_idx in kf.split(X):
# YOUR CODE HERE
pass
return theta, se
Paper: Athey & Imbens (2016) - Causal Trees and Forests
Question: Are the effects of job training heterogeneous? Who benefits most?
Source: LaLonde (1986) NSW data (297 treated, 425 control)
# Download Option A: Dehejia-Wahba
# https://users.nber.org/~rdehejia/data/.nswdata2.html
# Download Option B: causaldata package
pip install causaldata
from causaldata import lalonde
class CausalTree:
"""
Key difference from standard tree:
- Standard: minimize prediction MSE
- Causal: maximize |τ_left - τ_right| (treatment effect heterogeneity)
"""
def fit(self, X, Y, D):
"""
X: covariates, Y: outcome, D: treatment
TODO: Implement recursive partitioning that maximizes
treatment effect differences between leaves
"""
pass
def _treatment_effect(self, Y, D):
"""τ = E[Y|D=1] - E[Y|D=0]"""
# YOUR CODE HERE
pass
Paper: Callaway & Sant'Anna (2021) - Staggered DiD
Question: Effect of minimum wage increases with staggered adoption?
Source: Card & Krueger (1994) OR state-level panel
# Option A: Card-Krueger replication
# https://github.com/tyleransom/DiD-example
# Option B: QCEW data
# https://www.bls.gov/cew/
def compute_att_gt(df, group, time):
"""
Compute ATT(g,t) following Callaway & Sant'Anna (2021)
Group g: units first treated at time g
Compare to never-treated or not-yet-treated
ATT(g,t) = [E[Y_t|g] - E[Y_g|g]] - [E[Y_t|control] - E[Y_g|control]]
TODO: Implement group-time ATT calculation
"""
pass
def event_study_plot(df):
"""
Plot dynamic treatment effects by event time
X-axis: Time relative to treatment (-5 to +5)
Y-axis: Treatment effect with confidence intervals
"""
pass
Paper: Abadie et al. (2010) - Synthetic Control Method
Question: Did California's tobacco tax reduce cigarette sales?
Source: State-level cigarette consumption (39 states × 31 years)
# Option A: synthdid package
pip install synthdid
from synthdid import get_data
california = get_data('california_prop99')
# Option B: Ortega database
# https://github.com/NiclasOrtG/the-synthetic-control-group
from scipy.optimize import minimize
def synthetic_control(Y_target, Y_donors):
"""
Find optimal weights w that minimize ||Y_target - Y_donors @ w||²
Subject to: sum(w) = 1, w >= 0 (convex combination)
Y_target: (T_pre,) pre-treatment outcomes for treated unit
Y_donors: (T_pre, J) pre-treatment outcomes for donor pool
Returns: optimal weights (J,)
"""
def objective(w):
return np.sum((Y_target - Y_donors.T @ w) ** 2)
# TODO: Use scipy.optimize.minimize with constraints
pass
Paper: Athey & Wager (2021) - Policy Learning
Question: How to learn optimal treatment assignment rules?
Source: JTPA dataset (~11,600 participants)
# Option A: econml
from econml.datasets import fetch_jtpa
# Option B: MDRC (requires application)
# https://www.mdrc.org/
def doubly_robust_scores(Y, D, X):
"""
Compute doubly robust scores:
Γ = μ₁(X) - μ₀(X) + D(Y - μ₁(X))/e(X) - (1-D)(Y - μ₀(X))/(1-e(X))
where μ₁, μ₀ are outcome models and e(X) is propensity score
TODO: Implement with cross-fitting
"""
pass
def learn_policy(X, dr_scores, budget=0.5):
"""
Learn policy π(X) that maximizes welfare subject to budget
max E[π(X) * Γ] s.t. E[π(X)] ≤ budget
Solution: Treat if DR score > threshold
"""
pass
Paper: Baker et al. (2016) - EPU Index
Question: How to measure EPU from text? How does it affect outcomes?
Source: Newspaper archives OR EPU index directly
# Option A: EPU index (monthly)
# https://www.policyuncertainty.com/
# Option B: News articles (requires ProQuest/Factiva)
# Python: newspaper3k, beautifulsoup
# Option C: GDELT project
# https://www.gdeltproject.org/
UNCERTAINTY_TERMS = ['uncertain', 'uncertainty', 'risk', 'volatile']
POLICY_TERMS = ['policy', 'regulation', 'legislation', 'government']
ECONOMIC_TERMS = ['economy', 'growth', 'recession', 'inflation']
def count_epu_articles(articles):
"""
Article counts as EPU if it contains:
- At least one uncertainty term AND
- At least one policy term AND
- At least one economic term
EPU Index = (EPU articles / Total articles) × Normalization
"""
pass
| Section | Recommended Length | Content |
|---|---|---|
| Introduction | 1–2 pages | Motivation, research question, scope of the review |
| Conceptual Framework | 2–3 pages | Key concepts, taxonomy of methods (Prediction vs. Causal Inference) |
| Literature Review | 6–8 pages | Thematic synthesis of 10–15 core papers; backward and forward search |
| Gap & Future Directions | 2–3 pages | Critical assessment of missing links and promising extensions |
| Conclusion | 1 page | Summary and takeaways |
| Component | Points | Criteria |
|---|---|---|
| Literature Coverage | 35% | Relevance, breadth, and use of snowball search |
| Analysis & Synthesis | 35% | Thematic organization, critical comparison, not just listing |
| Writing | 20% | Clarity, organization, professionalism |
| Original Insight | 10% | Quality of identified gaps and future directions |
\documentclass[11pt]{article}
\usepackage[margin=1in]{geometry}
\usepackage{graphicx, amsmath, natbib}
\title{Replication and Extension of [Paper Title]}
\author{Your Name\\ECON6083}
\date{\today}
\begin{document}
\maketitle
\begin{abstract}
Brief summary of paper, replication approach, key findings, and extension.
(150-200 words)
\end{abstract}
\section{Introduction} (1-2 pages)
Motivation, research question, overview of replication and extension
\section{Literature and Background} (1-2 pages)
Related literature, theoretical framework, institutional details
\section{Data} (1-2 pages)
Data sources, sample construction, summary statistics (Table 1)
\section{Empirical Strategy} (2-3 pages)
Model specification, identification assumptions, estimation details
\section{Replication Results} (3-4 pages)
Main results, robustness checks, comparison to original paper
\section{Extension} (2-3 pages)
Motivation, methodology, results, interpretation
\section{Conclusion} (0.5-1 page)
Summary, limitations, future research
\bibliographystyle{aer}
\bibliography{references}
\appendix
\section{Additional Results}
\section{Code}
\end{document}
| Component | Points | Criteria |
|---|---|---|
| Replication | 40% | Accuracy, completeness, comparison to original |
| Extension | 30% | Originality, motivation, execution |
| Writing | 20% | Clarity, organization, professionalism |
| Code | 10% | Reproducibility, documentation, style |
pip install numpy pandas matplotlib scikit-learn scipy statsmodels
pip install econml # DML, Causal Forests
pip install synthdid # Synthetic Control
pip install linearmodels # Panel data
pip install transformers # BERT for text
pip install spacy nltk jieba # Text processing
| Week | Task |
|---|---|
| Week 8-9 | Choose paper, download data, start replication |
| Week 10 | Complete replication, identify extension idea |
| Week 11 | Implement extension |
| Week 12 | Write report, prepare submission |
| May 5 | Final deadline |