Replication package for “The Contribution of National Income Inequality to Regional Economic Divergence”

This package contains code and some datasets to replicate the analysis contained the paper and supplementary materials. In addition to the data stored in this package, the full analysis requires Census microdata. These data can be downloaded for free from usa.ipums.org. Please make sure to download the following variables: 

CNTYGP98
PUMA
HHINCOME
FAMUNIT
FAMSIZE
SEX
AGE
RACE
HISPAN
BPL
EDUC
LABFORCE
EMPSTAT
OCC1990
IND1990
INCTOT
FTOTINC
(Note: Not all of these variables are used in the final analysis, but they are mentioned at least somewhere in the code)

For the following samples:
1980 5% state
1990 5% state
2000 5%
2010 ACS 5 year
2015 ACS 5 year

Once you have the data downloaded, use the IPUMS-provided .do files to convert it into State format, and save it into "data/ipums" folder. Then run the following scrips, in order (making sure to change your working directory as appropriate).

***Prep code
prep_puma_lookups.R - makes crosswalks from PUMAs to 2013 MSAs and 1990 CZs
	input: 	data/reference/stfips.csv
		data/reference/CountyMSA_2013.csv
		data/reference/czlma903.csv
		data/reference/cznames_chetty2014.csv
		data/bea/st02_alaska_lookup.csv
		data/cz/cznames_chetty2014.csv
		data/mable/geocorr2010.csv
		data/mable/geocorr2000.csv
		data/mable/geocorr90.csv
		data/mable/cg98stat.csv

	output:	output/mable/puma1_to_msa.csv
		output/mable/puma1_to_cz.csv
		output/mable/county_msa_list.csv
		output/mable/puma0_to_msa.csv
		output/mable/puma0_to_cz.csv
		output/mable/puma9_to_msa.csv
		output/mable/puma9_to_cz.csv
		output/mable/puma8_to_msa.csv
		output/mable/puma8_to_cz.csv

prep_map_key_county.R - Makes a list mapping county FIPS to the lowercase county name for mapping with the maps package in R. Also includes MSA and CZ codes.
	input: 	data/reference/stfips.csv
		data/reference/CountyMSA_2013.csv
		data/reference/czlma903.csv
	output: output/mable/cntymsakey.csv	



***BEA per capita personal income analysis for supplement

bea_make_cities.R - Takes BEA county GDP data and converts to inflated MSA and CZ per capita personal income datasets
	input: 	data/reference/cpirs.dta
		output/mable/county_msa_list.csv
		data/reference/czlma903.csv
		data/reference/cznames_chetty2014.csv
		data/bea/st51_va_cou_xwalk.csv
		data/bea/st02_alaska_lookup.csv
		data/bea/CA1/CA1_1969_2015__ALL_AREAS.csv
	output:	output/bea/dta/nat_gdppc.csv
		output/bea/dta/msa_gdppc.csv
		output/bea/dta/cz_gdppc.csv

bea_analyze_cities.R - Takes output from bea_make_cities.R and creates graphs and tables of CZ and MSA personal income per capita trends over time, as shown in supplement
	input:	output/bea/dta/nat_gdppc.csv
		output/bea/dta/msa_gdppc.csv
		output/bea/dta/cz_gdppc.csv
	output:	output/bea/[geog]/devstats 

bea_maps_cities.R - Takes output from bea_make_cities.R and creates maps. 
	input:	output/bea/dta/nat_gdppc.csv
		output/bea/dta/msa_gdppc.csv
		output/bea/dta/cz_gdppc.csv
	output:	output/bea/[geog]/maps



***IPUMS distribution analysis

ipums_prep.R - Takes huge raw IPUMS data file in Stata format and converts it to MSA and CZ files for each decade 1980, 1990, 2000, 2008 (2006-2010 5-year), and 2013 (2011-2015 5-year). Note that prior to this you have to run the ipums dofile to convert into Stata, and comment out the lines labeling states and ages with text. This script is very computationally intensive and sometimes crashes R on my computer.

	input:	data/ipums/usa_00029.dta
	output:	data/ipums/ip80cz.RData
		data/ipums/ip80msa.RData
		data/ipums/ip90cz.RData
		data/ipums/ip90msa.RData
		data/ipums/ip00cz.RData
		data/ipums/ip00msa.RData
		data/ipums/ip08cz.RData
		data/ipums/ip08msa.RData
		data/ipums/ip13cz.RData
		data/ipums/ip13msa.RData


ipums_percentiles.R - Computes national income percentiles for various income measures in IPUMS. Distributes duplicate values across buckets evenly so there are always 100 buckets. Computes the mean and median income, count, and percentage of total population in each national percentile, both for the nation and for each metro. Outputs into output/ipums/[metro]/dta/[sample]. Also computes mean, median, observations, and population for the whole nation and each MSA in each sample/year.
entially deflate manually

	input:	data/ipums/ip[yr][geog].RData
	output: output/ipums/dta

ipums_analyze_cities.R - Calculates divergence statistics for IPUMS data. Outputs to ipums/[city]/devstats. Includes maps
	input:	data/mable/cntymsakey.csv
		data/cz/cznames_chetty2014.csv
		output/ipums/[geog]/dta/[geog]_natinfo.csv
	output:	output/ipums/[geog]/devstats 

ipums_polarization.R - Calculates fraction of population in 1980 and 2013 living in MSAs with incomes more than 20% greater or 20% less than national mean, as referenced in text
	input:	output/ipums/cz/dta/cz_natinfo_80.csv
		output/ipums/cz/dta/cz_natinfo_13.csv
	output:	output/cleangraphs/polarization_ipumsfam.csv

ipums_droptop.R - Constructs counterfactuals where I drop the top 1, 5, and 10% of the national income distribution and recompute beta and sigma divergence measures
	input:	output/ipums/[geog]/dta/[geog]_natinfo.csv
		output/ipums/[geog]/dta/[samp]/[geog]_citymeans_[samp]_80.csv
		output/ipums/[geog]/dta/[samp]/[geog]_pct_nat_[samp]_1980.csv
		output/ipums/[geog]/dta/[samp]/[geog]_pct_city_[samp]_1980.csv
	output:	output/ipums/[geog]/droptop/

ipums_segregation.R - Calculates Reardon and Bischoff’s H index of income segregation
	input: 	output/ipums/[geog]/dta/[samp]/[geog]_pct_city_[samp]_[yr].csv
	output:	output/ipums/[geog]/sorting/[samp]/natent_[samp]_[yr].csv
		output/ipums/[geog]/sorting/[samp]/ipums_seg_pctile.pdf

ipums_sorting.R - Calculates Zhou’s S and percent of total variation that is across metros. Note: this is extremely computationally intensive and takes a very long time
	input:	data/ipums/ip[yr][geog].RData
	output: output/ipums/[geog]/sorting/[ipums_[yr]crossvar_[geog].csv

ipums_sortgraph.R - Makes Figure 6 showing the H index, percent of total variation that is across metros, and Zhou’s S
	input:	output/ipums/[geog]/sorting/[ipums_[yr]crossvar_[geog].csv
		output/ipums/[geog]/sorting/[samp]/natent_[samp]_[yr].csv
	output:	output/ipums/[geog]/sorting/[samp]/ipums_sorting_[geog]_[samp].csv
		output/ipums/[geog]/sorting/[samp]/ipums_sorting_[geog]_[samp].pdf
	
ipums_counterfactuals.R - Constructs beta and sigma counterfactual scenarios based on the output of ipums_percentiles.R, and makes Figures 7 and 8. Does robustness checks varying the size of the percentile bucket to 1, 2, 4, and 5 (2 is used in paper), the sigma statistic reported (Coef. of variation, IQR, 10-90), the population year used for weighting regions, and the type of income studied.
	input:	output/ipums/[geog]/dta/[geog]_natinfo.csv
		output/ipums/[geog]/dta/[samp]/[geog]_citymeans_[samp]_80.csv
		output/ipums/[geog]/dta/[samp]/[geog]_pct_nat_[samp]_1980.csv
		output/ipums/[geog]/dta/[samp]/[geog]_pct_city_[samp]_1980.csv
	output:	output/ipums/[geog]/counterfactuals


***Post-analysis and independent Utility code
util_stylized_cities.R - Make stylized cities to demonstrate sorting and inequality mechanisms for Figure 1.
	output:	output/stylized

util_sigmagraphs.R - Makes clean graphs combining various measures of sigma divergence for BEA and IPUMS data for Figure 3.
	input:	output/bea/cz/devstats/bea_cz_devstats.csv
		output/ipums/cz/devstats/dev_cz.csv
	output:	output/sigmagraphs


util_census_gdp_ratio.R - Calculates the fraction of BEA total personal income reported in the Census for 1980 and 2013, as reported in end note 1.

