Final Project
Choose one of two tracks. Submit a 12β15 page academic report in LaTeX.
π
Due: May 5, 2026
βοΈ Weight: 40% of grade
π₯ Individual or pairs
π
Writing & LaTeX Practical Guide
Paper structure, section-by-section writing tips, LaTeX templates, and submission checklist β applies to both Track 1 & Track 2.
π₯ Download Slides
π Quick Start
Track 1: Empirical Replication
Pick one of 6 landmark causal ML papers below. Replicate its core results, then propose and run a meaningful extension.
- Pick a paper (Option 1β6)
- Open the Colab notebook β all data is pre-loaded locally
- Run the replication code
- Design and implement your extension
- Write the report in Overleaf
Track 2: Literature Review
Survey a specific ML-in-economics domain. Synthesize 10β15 core papers thematically.
- Pick a focused domain (e.g., ML in finance, causal ML in policy)
- Do a snowball search: backward (bibliographies) + forward (Google Scholar βCited byβ)
- Categorize by prediction vs. causal inference
- Identify gaps and future directions
- Write the review in Overleaf
π¦ What to submit: A .zip containing (1) code/ β clean, reproducible Python scripts or notebooks, and (2) report.pdf β 12-15 page academic paper.
π¬ Track 1: Six Paper Options
Each option has a runnable Colab notebook with local data pre-loaded in the data/ folder. No external downloads required.
Chernozhukov, V., et al. (2018). Double/debiased machine learning for treatment and structural parameters. Econometrics Journal, 21(1), C1βC68.
Question: What is the causal effect of 401(k) eligibility on net financial assets?
https://colab.research.google.com/github/JasmineHao/JasmineHao.github.io/blob/main/econ6083/final-project/notebooks/option1_dml_401k.ipynb?force_reload=true
Extension ideas: Compare multiple ML learners (XGBoost, Neural Nets); test sensitivity to K folds; estimate heterogeneous effects by income quartile; apply to Housing Provident Fund (HPF) effect on consumption.
Wager, S., & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. JASA, 113(523), 1228β1242.
Question: Are the effects of job training heterogeneous? Who benefits most?
https://colab.research.google.com/github/JasmineHao/JasmineHao.github.io/blob/main/econ6083/final-project/notebooks/option2_causal_forests.ipynb?force_reload=true
Extension ideas: Compare honest vs adaptive estimation; use SHAP for feature importance; implement personalized policy learning; test robustness to tree depth; apply to targeted poverty alleviation heterogeneity.
Callaway, B., & Sant'Anna, P. H. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2), 200β230.
Question: Effect of minimum wage increases with staggered adoption?
https://colab.research.google.com/github/JasmineHao/JasmineHao.github.io/blob/main/econ6083/final-project/notebooks/option3_did.ipynb?force_reload=true
Extension ideas: Compare TWFE vs Callaway-Sant'Anna vs Sun-Abraham vs Borusyak; analyze dynamic effects with event study; test never-treated vs not-yet-treated controls; apply to carbon emission trading pilots (staggered adoption 2013β2014).
Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic control methods for comparative case studies. JASA, 105(490), 493β505.
Question: Did California's Proposition 99 reduce per-capita cigarette sales?
https://colab.research.google.com/github/JasmineHao/JasmineHao.github.io/blob/main/econ6083/final-project/notebooks/option4_synthetic_control.ipynb?force_reload=true
Extension ideas: Implement Augmented SCM (Ben-Michael et al. 2021); compare SCM with modern DiD; test donor pool sensitivity (leave-one-out); apply to carbon trading: Guangdong vs donor provinces.
Kitagawa, T., & Tetenov, A. (2018). Who should be treated? Empirical welfare maximization methods for treatment choice. Econometrica, 86(2), 591β616.
Question: How to learn optimal treatment assignment rules?
https://colab.research.google.com/github/JasmineHao/JasmineHao.github.io/blob/main/econ6083/final-project/notebooks/option5_policy_learning.ipynb?force_reload=true
Extension ideas: Add fairness constraints (demographic parity); compare policy classes (linear vs tree vs budget-constrained); use IPW for evaluation; apply to optimal targeting of poverty alleviation resources.
Baker, S. R., Bloom, N., & Davis, S. J. (2016). Measuring economic policy uncertainty. QJE, 131(4), 1593β1636.
Question: How to measure EPU from text? How does it affect macro outcomes?
https://colab.research.google.com/github/JasmineHao/JasmineHao.github.io/blob/main/econ6083/final-project/notebooks/option6_epu_text_analysis.ipynb
https://colab.research.google.com/github/JasmineHao/JasmineHao.github.io/blob/main/econ6083/final-project/notebooks/option6_china_epu_extension.ipynb?force_reload=true
Extension ideas: Use BERT for sentiment analysis (vs keyword matching); build China-specific EPU index with Jieba; create separate fiscal/monetary/trade indices; compare keyword-based vs ML-based measures.
π Track 2: Literature Review
If you prefer a theoretical project, conduct a structured literature review instead of empirical replication.
How to do a snowball search
- Pick a seed paper from class (e.g., Mullainathan & Spiess 2017 on ML in policy; Gu, Kelly & Xiu 2020 on asset pricing).
- Backward search: Read the bibliography of your seed paper to find foundational work.
- Forward search: Use Google Scholar βCited byβ to find recent 2025β2026 developments.
- Categorize: Group papers by prediction (E[Y|X]) vs. causal inference vs. policy.
- Synthesize: Don't just list papers β compare them. How do newer ML methods improve on traditional econometrics?
- Identify the gap: What is missing? Structural interpretation? Selection bias handling? External validity?
Suggested Structure
| Section | Length | Content |
| Introduction | 1β2 pgs | Motivation, scope, research question |
| Conceptual Framework | 2β3 pgs | Taxonomy: prediction vs. causal inference |
| Literature Review | 6β8 pgs | Thematic synthesis of 10β15 core papers |
| Gap & Future Directions | 2β3 pgs | Critical assessment of missing links |
| Conclusion | 1 pg | Summary and takeaways |
Grading Rubric
| Literature Coverage | 35% | Relevance, breadth, snowball search quality |
| Analysis & Synthesis | 35% | Thematic organization, critical comparison |
| Writing | 20% | Clarity, organization, professionalism |
| Original Insight | 10% | Quality of identified gaps |
π Report Requirements (Both Tracks)
Format
- Length: 12β15 pages (excl. references & appendices)
- Tool: LaTeX on Overleaf
- Citations: AER style
- Font: 11β12pt, 1-inch margins
Track 1 Rubric
| Replication | 40% |
| Extension | 30% |
| Writing | 20% |
| Code | 10% |
How to Write Each Section (Track 1)
See the Writing & LaTeX Practical Guide slides for detailed section-by-section guidance, examples, and templates. Below is a quick reference.
Abstract (150β200 words)
One paragraph covering: (i) original paper, (ii) method, (iii) replication finding, (iv) extension & result. No citations.
1. Introduction (1β2 pages)
Reverse pyramid: broad motivation β specific paper β your replication β your extension β roadmap. Your own work should occupy at least half the section.
2. Literature (1β2 pages)
Group thematically (method, applications, China-related). Use transition sentences. End with "Our paper contributes by..."
3. Data (1β2 pages)
State source, sample size, unit of analysis, and exclusions. Include Table 1 (summary statistics) with only variables you use.
4. Empirical Strategy (2β3 pages)
Write the estimating equation, define every symbol, state identification assumptions, and explain standard errors.
5. Replication Results (3β4 pages)
Reproduce the core result, compare to the original, and run 2β3 robustness checks. Include figures (event-study, CATE, etc.).
6. Extension (2β3 pages)
Motivate the new question, explain the method difference, present 1β2 tables/figures, and interpret magnitudes economically.
7. Conclusion (0.5β1 page)
One sentence each for replication and extension findings, one explicit limitation, and one future direction. Synthesize, don't restate.
Tables & Figures Checklist
- Every table has a caption above it; every figure has a caption below it
- Axis labels are readable (no tiny fonts)
- Error bars / confidence intervals are shown
- Regression tables report N, RΒ², and fixed effects
- Stars defined: * p<0.1, ** p<0.05, *** p<0.01
- At least 1 table and 1 figure in the main text
LaTeX Starter Template
\documentclass[11pt]{article}
\usepackage[margin=1in]{geometry}
\usepackage{graphicx, amsmath, natbib, booktabs, hyperref}
\title{Replication and Extension of [Paper Title]}
\author{Your Name \\ ECON6083}
\date{\today}
\begin{document}
\maketitle
\begin{abstract}
\noindent Brief summary of paper, replication approach,
key findings, and extension. (150--200 words)
\end{abstract}
\section{Introduction}
\section{Literature}
\section{Data}
\section{Empirical Strategy}
\section{Replication Results}
\section{Extension}
\section{Conclusion}
\bibliographystyle{aer}
\bibliography{references}
\appendix
\section{Additional Results}
\section{Code}
\end{document}
How to Write Each Section (Track 2)
See the Writing & LaTeX Practical Guide slides for detailed section-by-section guidance, examples, and templates. Below is a quick reference.
1. Introduction (1β2 pages)
Broad motivation β scope of review β research question β roadmap. Be specific about the economic subdomain.
2. Conceptual Framework (2β3 pages)
Build a 2Γ2 taxonomy (e.g., prediction vs. causal Γ cross-sectional vs. panel). Define key terms and highlight trade-offs.
3. Literature Review (6β8 pages)
Organize thematically (not chronologically): foundational methods, applications, and criticisms. Compare, don't list.
4. Gap & Future Directions (2β3 pages)
Identify methodological, empirical, and external-validity gaps. Propose 2β3 concrete research projects to fill them.
5. Conclusion (1 page)
Summarize 2β3 takeaways, restate the most important gap, and end with a forward-looking sentence.
π¨π³ Chinese Datasets (Locally Hosted)
All China data is bundled locally. No registration or external download required.
| Dataset | Scope | Size | Best For |
CFPS Panel
data/cfps_panel.csv |
21,233 individuals, 7 waves (2010β2022), 52 vars |
22.5 MB |
Options 1, 2, 5 |
City Panel + Policies
data/china_city_panel_with_policies.csv |
300 cities Γ 34 years (1990β2023), 40 economic vars + 5 policy indicators |
3.1 MB |
Options 3, 4 |
Policy Indicators (all pre-merged in the City Panel)
| Policy | Indicator Columns | Treated | Batches | Method |
LCCP Low-Carbon City Pilots |
lccp_treat, lccp_batch, lccp_first_year |
49 cities |
2010 / 2012 / 2017 |
Staggered DiD |
| Sponge City |
sponge_treat, sponge_batch, sponge_first_year |
23 cities |
2015 / 2016 |
DiD or SCM |
CBEC Cross-Border E-Commerce |
cbec_treat, cbec_batch, cbec_first_year |
99 cities |
2015 / 2016 / 2018 / 2019 / 2020 |
Staggered DiD |
LTCI Long-Term Care Insurance |
ltci_treat, ltci_batch, ltci_first_year |
26 cities |
2016 / 2020 |
Staggered DiD |
| Smart City |
smart_treat, smart_batch, smart_first_year |
115 cities |
2013 / 2014 / 2015 |
Staggered DiD |
All policy indicators are already merged into china_city_panel_with_policies.csv. Each policy has three columns: _treat (ever treated = 1), _batch (which batch), and _first_year (adoption year, 0 for never-treated). Load one file and run DiD/SCM immediately β no merges needed.
Direct Download Links
# CFPS Panel
https://raw.githubusercontent.com/JasmineHao/JasmineHao.github.io/main/econ6083/final-project/notebooks/data/cfps_panel.csv
# City Panel + All 5 Policy Indicators
https://raw.githubusercontent.com/JasmineHao/JasmineHao.github.io/main/econ6083/final-project/notebooks/data/china_city_panel_with_policies.csv
# China EPU / CPI / PMI / LPR
https://raw.githubusercontent.com/JasmineHao/JasmineHao.github.io/main/econ6083/final-project/notebooks/data/china_epu.csv
https://raw.githubusercontent.com/JasmineHao/JasmineHao.github.io/main/econ6083/final-project/notebooks/data/china_cpi.csv
https://raw.githubusercontent.com/JasmineHao/JasmineHao.github.io/main/econ6083/final-project/notebooks/data/china_pmi.csv
https://raw.githubusercontent.com/JasmineHao/JasmineHao.github.io/main/econ6083/final-project/notebooks/data/china_lpr.csv
π¨π³ Suggested Extensions with Local Data
Each option below pairs the original US/EU paper with a concrete China extension using the datasets above. All regressions use the panel structure of the data.
Option 1 β DML β CFPS Internet Access
Question: What is the causal effect of internet access on household income?
- Data:
cfps_panel.csv β treatment internet, outcome lnincome
- Model: Partially linear DML
- Extension: Compare Lasso vs. Random Forest nuisance functions; estimate CATE by education.
Option 2 β Causal Forests β CFPS Health Insurance Heterogeneity
Question: Who benefits most from medical insurance?
- Data:
cfps_panel.csv β treatment medsure_dum, outcome health
- Model: Causal forest for CATE estimation
- Extension: Plot heterogeneity by age Γ education; use SHAP for feature importance.
Option 3 β DiD β Low-Carbon City Pilots (LCCP)
Question: Did low-carbon city pilots reduce SOβ emissions?
- Data:
china_city_panel_with_policies.csv β policy lccp_treat
- Model: Staggered DiD with city and year fixed effects
- Extension: Event-study plot; Callaway-Sant'Anna group-time ATT; compare with Sponge City / CBEC / Smart City effects.
Option 3b β DiD β Sponge City Pilots
Question: Did sponge city investment increase fiscal expenditure?
- Data:
china_city_panel_with_policies.csv β policy sponge_treat
- Model: Staggered DiD
- Extension: Compare TWFE vs. Borusyak et al. imputation estimator; placebo leads.
Option 3c β DiD β CBEC Pilot Zones
Question: Did CBEC zones boost exports and FDI?
- Data:
china_city_panel_with_policies.csv β policy cbec_treat
- Model: Staggered DiD
- Extension: Heterogeneous effects by region (East vs. West); dynamic effects with event study.
Option 3d β DiD β Smart City Pilots
Question: Did smart city construction improve environmental outcomes?
- Data:
china_city_panel_with_policies.csv β policy smart_treat
- Model: Staggered DiD
- Extension: Test green innovation channel (use
sci_exp as mediator); spatial spillovers.
Option 3e β DiD β LTCI Pilots
Question: Did long-term care insurance affect fiscal health spending?
- Data:
china_city_panel_with_policies.csv β policy ltci_treat
- Model: Staggered DiD
- Extension: Compare 2016 vs. 2020 batch effects; test for pre-trends with placebo leads.
Option 4 β SCM β Any Single-Treated City
Question: What is the effect of a specific policy on a single city?
- Data:
china_city_panel_with_policies.csv
- Model: Synthetic control method
- Extension: Placebo test on donors; leave-one-out sensitivity; compare SCM estimate with DiD.
Option 5 β Policy Learning β CFPS Digital Inclusion
Question: Who should be targeted for digital-inclusion subsidies?
- Data:
cfps_panel.csv β treatment internet, outcome lnincome
- Model: Empirical welfare maximization
- Extension: Budget-constrained tree (subsidize 30%); compare naive (poorest) vs. learned rule.
Option 6 β Text/EPU β China EPU + Macro
Question: How does policy uncertainty affect industrial production?
- Data:
china_epu.csv + china_pmi.csv
- Model: Time-series regression of PMI on EPU
- Extension: Lagged EPU(tβ1) β PMI(t); BERT-based sentiment vs. keyword measure; fiscal/monetary sub-indices.
π
Deadline: May 5, 2026 β Start early. Replication and debugging always take longer than expected. Come to office hours if you get stuck.
π οΈ Resources
Required Python packages
pip install numpy pandas matplotlib scikit-learn scipy statsmodels
Optional packages
pip install econml # DML, Causal Forests
pip install synthdid # Synthetic Control
pip install linearmodels # Panel data
pip install transformers # BERT for text
pip install spacy nltk jieba # Text processing
Helpful links
Report checklist
- Title page with name & student ID
- Abstract (150β200 words)
- At least 1 table and 1 figure
- References in AER format
- Clean, commented code
- PDF compiles without errors