ECON6083 Final Project

Each option has a runnable Colab notebook with local data pre-loaded in the data/ folder. No external downloads required.

Option 1 — Double Machine Learning

Local Data

Chernozhukov, V., et al. (2018). Double/debiased machine learning for treatment and structural parameters. Econometrics Journal, 21(1), C1–C68.

Question: What is the causal effect of 401(k) eligibility on net financial assets?

https://colab.research.google.com/github/JasmineHao/JasmineHao.github.io/blob/main/econ6083/final-project/notebooks/option1_dml_401k.ipynb?force_reload=true

Extension ideas: Compare multiple ML learners (XGBoost, Neural Nets); test sensitivity to K folds; estimate heterogeneous effects by income quartile; apply to Housing Provident Fund (HPF) effect on consumption.

Option 2 — Causal Forests

Local Data

Wager, S., & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. JASA, 113(523), 1228–1242.

Question: Are the effects of job training heterogeneous? Who benefits most?

https://colab.research.google.com/github/JasmineHao/JasmineHao.github.io/blob/main/econ6083/final-project/notebooks/option2_causal_forests.ipynb?force_reload=true

Extension ideas: Compare honest vs adaptive estimation; use SHAP for feature importance; implement personalized policy learning; test robustness to tree depth; apply to targeted poverty alleviation heterogeneity.

Option 3 — Difference-in-Differences

Local Data

Callaway, B., & Sant'Anna, P. H. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2), 200–230.

Question: Effect of minimum wage increases with staggered adoption?

https://colab.research.google.com/github/JasmineHao/JasmineHao.github.io/blob/main/econ6083/final-project/notebooks/option3_did.ipynb?force_reload=true

Extension ideas: Compare TWFE vs Callaway-Sant'Anna vs Sun-Abraham vs Borusyak; analyze dynamic effects with event study; test never-treated vs not-yet-treated controls; apply to carbon emission trading pilots (staggered adoption 2013–2014).

Option 4 — Synthetic Control

Local Data

Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic control methods for comparative case studies. JASA, 105(490), 493–505.

Question: Did California's Proposition 99 reduce per-capita cigarette sales?

https://colab.research.google.com/github/JasmineHao/JasmineHao.github.io/blob/main/econ6083/final-project/notebooks/option4_synthetic_control.ipynb?force_reload=true

Extension ideas: Implement Augmented SCM (Ben-Michael et al. 2021); compare SCM with modern DiD; test donor pool sensitivity (leave-one-out); apply to carbon trading: Guangdong vs donor provinces.

Option 5 — Policy Learning

Local Data

Kitagawa, T., & Tetenov, A. (2018). Who should be treated? Empirical welfare maximization methods for treatment choice. Econometrica, 86(2), 591–616.

Question: How to learn optimal treatment assignment rules?

https://colab.research.google.com/github/JasmineHao/JasmineHao.github.io/blob/main/econ6083/final-project/notebooks/option5_policy_learning.ipynb?force_reload=true

Extension ideas: Add fairness constraints (demographic parity); compare policy classes (linear vs tree vs budget-constrained); use IPW for evaluation; apply to optimal targeting of poverty alleviation resources.

Option 6 — Text as Data & EPU

Local Data

Baker, S. R., Bloom, N., & Davis, S. J. (2016). Measuring economic policy uncertainty. QJE, 131(4), 1593–1636.

Question: How to measure EPU from text? How does it affect macro outcomes?

🇨🇳 China Extension

https://colab.research.google.com/github/JasmineHao/JasmineHao.github.io/blob/main/econ6083/final-project/notebooks/option6_epu_text_analysis.ipynb
https://colab.research.google.com/github/JasmineHao/JasmineHao.github.io/blob/main/econ6083/final-project/notebooks/option6_china_epu_extension.ipynb?force_reload=true

Extension ideas: Use BERT for sentiment analysis (vs keyword matching); build China-specific EPU index with Jieba; create separate fiscal/monetary/trade indices; compare keyword-based vs ML-based measures.

📚 Track 2: Literature Review

If you prefer a theoretical project, conduct a structured literature review instead of empirical replication.

How to do a snowball search

Pick a seed paper from class (e.g., Mullainathan & Spiess 2017 on ML in policy; Gu, Kelly & Xiu 2020 on asset pricing).
Backward search: Read the bibliography of your seed paper to find foundational work.
Forward search: Use Google Scholar “Cited by” to find recent 2025–2026 developments.
Categorize: Group papers by prediction (E[Y|X]) vs. causal inference vs. policy.
Synthesize: Don't just list papers — compare them. How do newer ML methods improve on traditional econometrics?
Identify the gap: What is missing? Structural interpretation? Selection bias handling? External validity?

Suggested Structure

Section	Length	Content
Introduction	1–2 pgs	Motivation, scope, research question
Conceptual Framework	2–3 pgs	Taxonomy: prediction vs. causal inference
Literature Review	6–8 pgs	Thematic synthesis of 10–15 core papers
Gap & Future Directions	2–3 pgs	Critical assessment of missing links
Conclusion	1 pg	Summary and takeaways

Grading Rubric

Literature Coverage	35%	Relevance, breadth, snowball search quality
Analysis & Synthesis	35%	Thematic organization, critical comparison
Writing	20%	Clarity, organization, professionalism
Original Insight	10%	Quality of identified gaps

📊 Report Requirements (Both Tracks)

How to Write Each Section (Track 1)

See the Writing & LaTeX Practical Guide slides for detailed section-by-section guidance, examples, and templates. Below is a quick reference.

Abstract (150–200 words)

One paragraph covering: (i) original paper, (ii) method, (iii) replication finding, (iv) extension & result. No citations.

1. Introduction (1–2 pages)

Reverse pyramid: broad motivation → specific paper → your replication → your extension → roadmap. Your own work should occupy at least half the section.

2. Literature (1–2 pages)

Group thematically (method, applications, China-related). Use transition sentences. End with "Our paper contributes by..."

3. Data (1–2 pages)

State source, sample size, unit of analysis, and exclusions. Include Table 1 (summary statistics) with only variables you use.

4. Empirical Strategy (2–3 pages)

Write the estimating equation, define every symbol, state identification assumptions, and explain standard errors.

5. Replication Results (3–4 pages)

Reproduce the core result, compare to the original, and run 2–3 robustness checks. Include figures (event-study, CATE, etc.).

6. Extension (2–3 pages)

Motivate the new question, explain the method difference, present 1–2 tables/figures, and interpret magnitudes economically.

7. Conclusion (0.5–1 page)

One sentence each for replication and extension findings, one explicit limitation, and one future direction. Synthesize, don't restate.

Tables & Figures Checklist

Every table has a caption above it; every figure has a caption below it
Axis labels are readable (no tiny fonts)
Error bars / confidence intervals are shown
Regression tables report N, R², and fixed effects
Stars defined: * p<0.1, ** p<0.05, *** p<0.01
At least 1 table and 1 figure in the main text

LaTeX Starter Template

\documentclass[11pt]{article}
\usepackage[margin=1in]{geometry}
\usepackage{graphicx, amsmath, natbib, booktabs, hyperref}

\title{Replication and Extension of [Paper Title]}
\author{Your Name \\ ECON6083}
\date{\today}

\begin{document}
\maketitle

\begin{abstract}
\noindent Brief summary of paper, replication approach,
key findings, and extension. (150--200 words)
\end{abstract}

\section{Introduction}
\section{Literature}
\section{Data}
\section{Empirical Strategy}
\section{Replication Results}
\section{Extension}
\section{Conclusion}

\bibliographystyle{aer}
\bibliography{references}

\appendix
\section{Additional Results}
\section{Code}
\end{document}

How to Write Each Section (Track 2)

See the Writing & LaTeX Practical Guide slides for detailed section-by-section guidance, examples, and templates. Below is a quick reference.

1. Introduction (1–2 pages)

Broad motivation → scope of review → research question → roadmap. Be specific about the economic subdomain.

2. Conceptual Framework (2–3 pages)

Build a 2×2 taxonomy (e.g., prediction vs. causal × cross-sectional vs. panel). Define key terms and highlight trade-offs.

3. Literature Review (6–8 pages)

Organize thematically (not chronologically): foundational methods, applications, and criticisms. Compare, don't list.

4. Gap & Future Directions (2–3 pages)

Identify methodological, empirical, and external-validity gaps. Propose 2–3 concrete research projects to fill them.

5. Conclusion (1 page)

Summarize 2–3 takeaways, restate the most important gap, and end with a forward-looking sentence.

🇨🇳 Chinese Datasets (Locally Hosted)

All China data is bundled locally. No registration or external download required.

Dataset	Scope	Size	Best For
CFPS Panel `data/cfps_panel.csv`	21,233 individuals, 7 waves (2010–2022), 52 vars	22.5 MB	Options 1, 2, 5
City Panel + Policies `data/china_city_panel_with_policies.csv`	300 cities × 34 years (1990–2023), 40 economic vars + 5 policy indicators	3.1 MB	Options 3, 4

Policy Indicators (all pre-merged in the City Panel)

Policy	Indicator Columns	Treated	Batches	Method
LCCP Low-Carbon City Pilots	`lccp_treat`, `lccp_batch`, `lccp_first_year`	49 cities	2010 / 2012 / 2017	Staggered DiD
Sponge City	`sponge_treat`, `sponge_batch`, `sponge_first_year`	23 cities	2015 / 2016	DiD or SCM
CBEC Cross-Border E-Commerce	`cbec_treat`, `cbec_batch`, `cbec_first_year`	99 cities	2015 / 2016 / 2018 / 2019 / 2020	Staggered DiD
LTCI Long-Term Care Insurance	`ltci_treat`, `ltci_batch`, `ltci_first_year`	26 cities	2016 / 2020	Staggered DiD
Smart City	`smart_treat`, `smart_batch`, `smart_first_year`	115 cities	2013 / 2014 / 2015	Staggered DiD

All policy indicators are already merged into china_city_panel_with_policies.csv. Each policy has three columns: _treat (ever treated = 1), _batch (which batch), and _first_year (adoption year, 0 for never-treated). Load one file and run DiD/SCM immediately — no merges needed.

Direct Download Links

# CFPS Panel
https://raw.githubusercontent.com/JasmineHao/JasmineHao.github.io/main/econ6083/final-project/notebooks/data/cfps_panel.csv

# City Panel + All 5 Policy Indicators
https://raw.githubusercontent.com/JasmineHao/JasmineHao.github.io/main/econ6083/final-project/notebooks/data/china_city_panel_with_policies.csv

# China EPU / CPI / PMI / LPR
https://raw.githubusercontent.com/JasmineHao/JasmineHao.github.io/main/econ6083/final-project/notebooks/data/china_epu.csv
https://raw.githubusercontent.com/JasmineHao/JasmineHao.github.io/main/econ6083/final-project/notebooks/data/china_cpi.csv
https://raw.githubusercontent.com/JasmineHao/JasmineHao.github.io/main/econ6083/final-project/notebooks/data/china_pmi.csv
https://raw.githubusercontent.com/JasmineHao/JasmineHao.github.io/main/econ6083/final-project/notebooks/data/china_lpr.csv

🇨🇳 Suggested Extensions with Local Data

Each option below pairs the original US/EU paper with a concrete China extension using the datasets above. All regressions use the panel structure of the data.

Option 1 — DML → CFPS Internet Access

Question: What is the causal effect of internet access on household income?

Data: cfps_panel.csv — treatment internet, outcome lnincome
Model: Partially linear DML
Extension: Compare Lasso vs. Random Forest nuisance functions; estimate CATE by education.

Option 2 — Causal Forests → CFPS Health Insurance Heterogeneity

Question: Who benefits most from medical insurance?

Data: cfps_panel.csv — treatment medsure_dum, outcome health
Model: Causal forest for CATE estimation
Extension: Plot heterogeneity by age × education; use SHAP for feature importance.

Option 3 — DiD → Low-Carbon City Pilots (LCCP)

Question: Did low-carbon city pilots reduce SO₂ emissions?

Data: china_city_panel_with_policies.csv — policy lccp_treat
Model: Staggered DiD with city and year fixed effects
Extension: Event-study plot; Callaway-Sant'Anna group-time ATT; compare with Sponge City / CBEC / Smart City effects.

Option 3b — DiD → Sponge City Pilots

Question: Did sponge city investment increase fiscal expenditure?

Data: china_city_panel_with_policies.csv — policy sponge_treat
Model: Staggered DiD
Extension: Compare TWFE vs. Borusyak et al. imputation estimator; placebo leads.

Option 3c — DiD → CBEC Pilot Zones

Question: Did CBEC zones boost exports and FDI?

Data: china_city_panel_with_policies.csv — policy cbec_treat
Model: Staggered DiD
Extension: Heterogeneous effects by region (East vs. West); dynamic effects with event study.

Option 3d — DiD → Smart City Pilots

Question: Did smart city construction improve environmental outcomes?

Data: china_city_panel_with_policies.csv — policy smart_treat
Model: Staggered DiD
Extension: Test green innovation channel (use sci_exp as mediator); spatial spillovers.

Option 3e — DiD → LTCI Pilots

Question: Did long-term care insurance affect fiscal health spending?

Data: china_city_panel_with_policies.csv — policy ltci_treat
Model: Staggered DiD
Extension: Compare 2016 vs. 2020 batch effects; test for pre-trends with placebo leads.

Option 4 — SCM → Any Single-Treated City

Question: What is the effect of a specific policy on a single city?

Data: china_city_panel_with_policies.csv
Model: Synthetic control method
Extension: Placebo test on donors; leave-one-out sensitivity; compare SCM estimate with DiD.

Option 5 — Policy Learning → CFPS Digital Inclusion

Question: Who should be targeted for digital-inclusion subsidies?

Data: cfps_panel.csv — treatment internet, outcome lnincome
Model: Empirical welfare maximization
Extension: Budget-constrained tree (subsidize 30%); compare naive (poorest) vs. learned rule.

Option 6 — Text/EPU → China EPU + Macro

Question: How does policy uncertainty affect industrial production?

Data: china_epu.csv + china_pmi.csv
Model: Time-series regression of PMI on EPU
Extension: Lagged EPU(t−1) → PMI(t); BERT-based sentiment vs. keyword measure; fiscal/monetary sub-indices.

Final Project

Writing & LaTeX Practical Guide

🚀 Quick Start

Track 1: Empirical Replication

Track 2: Literature Review

🔬 Track 1: Six Paper Options

📚 Track 2: Literature Review

How to do a snowball search

Suggested Structure

Grading Rubric

📊 Report Requirements (Both Tracks)

Format

Track 1 Rubric

How to Write Each Section (Track 1)

Abstract (150–200 words)

1. Introduction (1–2 pages)

2. Literature (1–2 pages)

3. Data (1–2 pages)

4. Empirical Strategy (2–3 pages)

5. Replication Results (3–4 pages)

6. Extension (2–3 pages)

7. Conclusion (0.5–1 page)

Tables & Figures Checklist

LaTeX Starter Template

How to Write Each Section (Track 2)

1. Introduction (1–2 pages)

2. Conceptual Framework (2–3 pages)

3. Literature Review (6–8 pages)

4. Gap & Future Directions (2–3 pages)

5. Conclusion (1 page)

🇨🇳 Chinese Datasets (Locally Hosted)

Policy Indicators (all pre-merged in the City Panel)

Direct Download Links

🇨🇳 Suggested Extensions with Local Data

Option 1 — DML → CFPS Internet Access

Option 2 — Causal Forests → CFPS Health Insurance Heterogeneity

Option 3 — DiD → Low-Carbon City Pilots (LCCP)

Option 3b — DiD → Sponge City Pilots

Option 3c — DiD → CBEC Pilot Zones

Option 3d — DiD → Smart City Pilots

Option 3e — DiD → LTCI Pilots

Option 4 — SCM → Any Single-Treated City

Option 5 — Policy Learning → CFPS Digital Inclusion

Option 6 — Text/EPU → China EPU + Macro

🛠️ Resources

Required Python packages

Optional packages

Helpful links

Report checklist