Final Project

Choose one of two tracks. Submit a 12–15 page academic report in LaTeX.

πŸ“… Due: May 5, 2026 βš–οΈ Weight: 40% of grade πŸ‘₯ Individual or pairs
πŸ“Š

Writing & LaTeX Practical Guide

Paper structure, section-by-section writing tips, LaTeX templates, and submission checklist β€” applies to both Track 1 & Track 2.

πŸ“₯ Download Slides

πŸš€ Quick Start

Track 1: Empirical Replication

Pick one of 6 landmark causal ML papers below. Replicate its core results, then propose and run a meaningful extension.

  1. Pick a paper (Option 1–6)
  2. Open the Colab notebook β€” all data is pre-loaded locally
  3. Run the replication code
  4. Design and implement your extension
  5. Write the report in Overleaf
πŸ“ Open Overleaf

Track 2: Literature Review

Survey a specific ML-in-economics domain. Synthesize 10–15 core papers thematically.

  1. Pick a focused domain (e.g., ML in finance, causal ML in policy)
  2. Do a snowball search: backward (bibliographies) + forward (Google Scholar β€œCited by”)
  3. Categorize by prediction vs. causal inference
  4. Identify gaps and future directions
  5. Write the review in Overleaf
πŸ“¦ What to submit: A .zip containing (1) code/ β€” clean, reproducible Python scripts or notebooks, and (2) report.pdf β€” 12-15 page academic paper.

πŸ”¬ Track 1: Six Paper Options

Each option has a runnable Colab notebook with local data pre-loaded in the data/ folder. No external downloads required.

Option 1 β€” Double Machine Learning
Local Data
Chernozhukov, V., et al. (2018). Double/debiased machine learning for treatment and structural parameters. Econometrics Journal, 21(1), C1–C68.
Question: What is the causal effect of 401(k) eligibility on net financial assets?
https://colab.research.google.com/github/JasmineHao/JasmineHao.github.io/blob/main/econ6083/final-project/notebooks/option1_dml_401k.ipynb?force_reload=true
Extension ideas: Compare multiple ML learners (XGBoost, Neural Nets); test sensitivity to K folds; estimate heterogeneous effects by income quartile; apply to Housing Provident Fund (HPF) effect on consumption.
Option 2 β€” Causal Forests
Local Data
Wager, S., & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. JASA, 113(523), 1228–1242.
Question: Are the effects of job training heterogeneous? Who benefits most?
https://colab.research.google.com/github/JasmineHao/JasmineHao.github.io/blob/main/econ6083/final-project/notebooks/option2_causal_forests.ipynb?force_reload=true
Extension ideas: Compare honest vs adaptive estimation; use SHAP for feature importance; implement personalized policy learning; test robustness to tree depth; apply to targeted poverty alleviation heterogeneity.
Option 3 β€” Difference-in-Differences
Local Data
Callaway, B., & Sant'Anna, P. H. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2), 200–230.
Question: Effect of minimum wage increases with staggered adoption?
https://colab.research.google.com/github/JasmineHao/JasmineHao.github.io/blob/main/econ6083/final-project/notebooks/option3_did.ipynb?force_reload=true
Extension ideas: Compare TWFE vs Callaway-Sant'Anna vs Sun-Abraham vs Borusyak; analyze dynamic effects with event study; test never-treated vs not-yet-treated controls; apply to carbon emission trading pilots (staggered adoption 2013–2014).
Option 4 β€” Synthetic Control
Local Data
Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic control methods for comparative case studies. JASA, 105(490), 493–505.
Question: Did California's Proposition 99 reduce per-capita cigarette sales?
https://colab.research.google.com/github/JasmineHao/JasmineHao.github.io/blob/main/econ6083/final-project/notebooks/option4_synthetic_control.ipynb?force_reload=true
Extension ideas: Implement Augmented SCM (Ben-Michael et al. 2021); compare SCM with modern DiD; test donor pool sensitivity (leave-one-out); apply to carbon trading: Guangdong vs donor provinces.
Option 5 β€” Policy Learning
Local Data
Kitagawa, T., & Tetenov, A. (2018). Who should be treated? Empirical welfare maximization methods for treatment choice. Econometrica, 86(2), 591–616.
Question: How to learn optimal treatment assignment rules?
https://colab.research.google.com/github/JasmineHao/JasmineHao.github.io/blob/main/econ6083/final-project/notebooks/option5_policy_learning.ipynb?force_reload=true
Extension ideas: Add fairness constraints (demographic parity); compare policy classes (linear vs tree vs budget-constrained); use IPW for evaluation; apply to optimal targeting of poverty alleviation resources.
Option 6 β€” Text as Data & EPU
Local Data
Baker, S. R., Bloom, N., & Davis, S. J. (2016). Measuring economic policy uncertainty. QJE, 131(4), 1593–1636.
Question: How to measure EPU from text? How does it affect macro outcomes?
https://colab.research.google.com/github/JasmineHao/JasmineHao.github.io/blob/main/econ6083/final-project/notebooks/option6_epu_text_analysis.ipynb
https://colab.research.google.com/github/JasmineHao/JasmineHao.github.io/blob/main/econ6083/final-project/notebooks/option6_china_epu_extension.ipynb?force_reload=true
Extension ideas: Use BERT for sentiment analysis (vs keyword matching); build China-specific EPU index with Jieba; create separate fiscal/monetary/trade indices; compare keyword-based vs ML-based measures.

πŸ“š Track 2: Literature Review

If you prefer a theoretical project, conduct a structured literature review instead of empirical replication.

How to do a snowball search

  1. Pick a seed paper from class (e.g., Mullainathan & Spiess 2017 on ML in policy; Gu, Kelly & Xiu 2020 on asset pricing).
  2. Backward search: Read the bibliography of your seed paper to find foundational work.
  3. Forward search: Use Google Scholar β€œCited by” to find recent 2025–2026 developments.
  4. Categorize: Group papers by prediction (E[Y|X]) vs. causal inference vs. policy.
  5. Synthesize: Don't just list papers β€” compare them. How do newer ML methods improve on traditional econometrics?
  6. Identify the gap: What is missing? Structural interpretation? Selection bias handling? External validity?

Suggested Structure

SectionLengthContent
Introduction1–2 pgsMotivation, scope, research question
Conceptual Framework2–3 pgsTaxonomy: prediction vs. causal inference
Literature Review6–8 pgsThematic synthesis of 10–15 core papers
Gap & Future Directions2–3 pgsCritical assessment of missing links
Conclusion1 pgSummary and takeaways

Grading Rubric

Literature Coverage35%Relevance, breadth, snowball search quality
Analysis & Synthesis35%Thematic organization, critical comparison
Writing20%Clarity, organization, professionalism
Original Insight10%Quality of identified gaps

πŸ“Š Report Requirements (Both Tracks)

Format

Track 1 Rubric

Replication40%
Extension30%
Writing20%
Code10%

How to Write Each Section (Track 1)

See the Writing & LaTeX Practical Guide slides for detailed section-by-section guidance, examples, and templates. Below is a quick reference.

Abstract (150–200 words)

One paragraph covering: (i) original paper, (ii) method, (iii) replication finding, (iv) extension & result. No citations.

1. Introduction (1–2 pages)

Reverse pyramid: broad motivation β†’ specific paper β†’ your replication β†’ your extension β†’ roadmap. Your own work should occupy at least half the section.

2. Literature (1–2 pages)

Group thematically (method, applications, China-related). Use transition sentences. End with "Our paper contributes by..."

3. Data (1–2 pages)

State source, sample size, unit of analysis, and exclusions. Include Table 1 (summary statistics) with only variables you use.

4. Empirical Strategy (2–3 pages)

Write the estimating equation, define every symbol, state identification assumptions, and explain standard errors.

5. Replication Results (3–4 pages)

Reproduce the core result, compare to the original, and run 2–3 robustness checks. Include figures (event-study, CATE, etc.).

6. Extension (2–3 pages)

Motivate the new question, explain the method difference, present 1–2 tables/figures, and interpret magnitudes economically.

7. Conclusion (0.5–1 page)

One sentence each for replication and extension findings, one explicit limitation, and one future direction. Synthesize, don't restate.

Tables & Figures Checklist

LaTeX Starter Template

\documentclass[11pt]{article}
\usepackage[margin=1in]{geometry}
\usepackage{graphicx, amsmath, natbib, booktabs, hyperref}

\title{Replication and Extension of [Paper Title]}
\author{Your Name \\ ECON6083}
\date{\today}

\begin{document}
\maketitle

\begin{abstract}
\noindent Brief summary of paper, replication approach,
key findings, and extension. (150--200 words)
\end{abstract}

\section{Introduction}
\section{Literature}
\section{Data}
\section{Empirical Strategy}
\section{Replication Results}
\section{Extension}
\section{Conclusion}

\bibliographystyle{aer}
\bibliography{references}

\appendix
\section{Additional Results}
\section{Code}
\end{document}

How to Write Each Section (Track 2)

See the Writing & LaTeX Practical Guide slides for detailed section-by-section guidance, examples, and templates. Below is a quick reference.

1. Introduction (1–2 pages)

Broad motivation β†’ scope of review β†’ research question β†’ roadmap. Be specific about the economic subdomain.

2. Conceptual Framework (2–3 pages)

Build a 2Γ—2 taxonomy (e.g., prediction vs. causal Γ— cross-sectional vs. panel). Define key terms and highlight trade-offs.

3. Literature Review (6–8 pages)

Organize thematically (not chronologically): foundational methods, applications, and criticisms. Compare, don't list.

4. Gap & Future Directions (2–3 pages)

Identify methodological, empirical, and external-validity gaps. Propose 2–3 concrete research projects to fill them.

5. Conclusion (1 page)

Summarize 2–3 takeaways, restate the most important gap, and end with a forward-looking sentence.

πŸ‡¨πŸ‡³ Chinese Datasets (Locally Hosted)

All China data is bundled locally. No registration or external download required.

DatasetScopeSizeBest For
CFPS Panel
data/cfps_panel.csv
21,233 individuals, 7 waves (2010–2022), 52 vars 22.5 MB Options 1, 2, 5
City Panel + Policies
data/china_city_panel_with_policies.csv
300 cities Γ— 34 years (1990–2023), 40 economic vars + 5 policy indicators 3.1 MB Options 3, 4

Policy Indicators (all pre-merged in the City Panel)

PolicyIndicator ColumnsTreatedBatchesMethod
LCCP
Low-Carbon City Pilots
lccp_treat, lccp_batch, lccp_first_year 49 cities 2010 / 2012 / 2017 Staggered DiD
Sponge City sponge_treat, sponge_batch, sponge_first_year 23 cities 2015 / 2016 DiD or SCM
CBEC
Cross-Border E-Commerce
cbec_treat, cbec_batch, cbec_first_year 99 cities 2015 / 2016 / 2018 / 2019 / 2020 Staggered DiD
LTCI
Long-Term Care Insurance
ltci_treat, ltci_batch, ltci_first_year 26 cities 2016 / 2020 Staggered DiD
Smart City smart_treat, smart_batch, smart_first_year 115 cities 2013 / 2014 / 2015 Staggered DiD

All policy indicators are already merged into china_city_panel_with_policies.csv. Each policy has three columns: _treat (ever treated = 1), _batch (which batch), and _first_year (adoption year, 0 for never-treated). Load one file and run DiD/SCM immediately β€” no merges needed.

Direct Download Links

# CFPS Panel
https://raw.githubusercontent.com/JasmineHao/JasmineHao.github.io/main/econ6083/final-project/notebooks/data/cfps_panel.csv

# City Panel + All 5 Policy Indicators
https://raw.githubusercontent.com/JasmineHao/JasmineHao.github.io/main/econ6083/final-project/notebooks/data/china_city_panel_with_policies.csv

# China EPU / CPI / PMI / LPR
https://raw.githubusercontent.com/JasmineHao/JasmineHao.github.io/main/econ6083/final-project/notebooks/data/china_epu.csv
https://raw.githubusercontent.com/JasmineHao/JasmineHao.github.io/main/econ6083/final-project/notebooks/data/china_cpi.csv
https://raw.githubusercontent.com/JasmineHao/JasmineHao.github.io/main/econ6083/final-project/notebooks/data/china_pmi.csv
https://raw.githubusercontent.com/JasmineHao/JasmineHao.github.io/main/econ6083/final-project/notebooks/data/china_lpr.csv

πŸ‡¨πŸ‡³ Suggested Extensions with Local Data

Each option below pairs the original US/EU paper with a concrete China extension using the datasets above. All regressions use the panel structure of the data.

Option 1 β€” DML β†’ CFPS Internet Access

Question: What is the causal effect of internet access on household income?

Option 2 β€” Causal Forests β†’ CFPS Health Insurance Heterogeneity

Question: Who benefits most from medical insurance?

Option 3 β€” DiD β†’ Low-Carbon City Pilots (LCCP)

Question: Did low-carbon city pilots reduce SOβ‚‚ emissions?

Option 3b β€” DiD β†’ Sponge City Pilots

Question: Did sponge city investment increase fiscal expenditure?

Option 3c β€” DiD β†’ CBEC Pilot Zones

Question: Did CBEC zones boost exports and FDI?

Option 3d β€” DiD β†’ Smart City Pilots

Question: Did smart city construction improve environmental outcomes?

Option 3e β€” DiD β†’ LTCI Pilots

Question: Did long-term care insurance affect fiscal health spending?

Option 4 β€” SCM β†’ Any Single-Treated City

Question: What is the effect of a specific policy on a single city?

Option 5 β€” Policy Learning β†’ CFPS Digital Inclusion

Question: Who should be targeted for digital-inclusion subsidies?

Option 6 β€” Text/EPU β†’ China EPU + Macro

Question: How does policy uncertainty affect industrial production?

πŸ“… Deadline: May 5, 2026 β€” Start early. Replication and debugging always take longer than expected. Come to office hours if you get stuck.

πŸ› οΈ Resources

Required Python packages

pip install numpy pandas matplotlib scikit-learn scipy statsmodels

Optional packages

pip install econml              # DML, Causal Forests
pip install synthdid            # Synthetic Control
pip install linearmodels        # Panel data
pip install transformers        # BERT for text
pip install spacy nltk jieba    # Text processing

Helpful links

Report checklist