Economic and Employment Factors:
U.S. Census Bureau API – Income, unemployment, poverty etc.
Cost of Living and Housing and Demographics and Migration Trends:
Zillow API – Real estate and housing data
U.S. Census Bureau API – Demographics, housing, etc.
U.S. Census Bureau’s American Community Survey (ACS) – Migration, demographic data
Quality of Life and Environment Trends:
Centers for Medicare & Medicaid Services Data API – Hospital Data
U.S. Census Bureau API – Health Insurance Coverage Data
U.S. Environmental Protection Agency API – Air Pollution Data
NPS Data API – National Park Data
State Policies and Taxation Trends:
Scraping Tax Foundation – Tax Data
Economic and Employment Factors:
Fetch Economic Data: retrieves per capita income, median household income, unemployment, labor force, below poverty, total poverty universe, median home value median gross rent for each zip code in CO and UT from the Census API
Generate Unemployment and Poverty Rate Data: Use total labor and poverty universe combined with unemployment and below poverty to generate unemployment and poverty rate.
Clean Data: Remove redundant columns, replace negative values and missing values for median value. For specific before and after please refer to this GitHub page. Where the CSV files labeled raw and clean are denoted and available for viewing.
Visualize Data in Several Formats to Explore Data: Several visualizations of data were made to compare different features of the data.
Cost of Living and Housing and Demographics and Migration Trends:
Handling Issues and Noise in the Data:
Aggregated demographic, migration, and real estate data from the Census and Zillow APIs at the ZIP code level for Colorado and Utah. Negative or infeasible values in housing, rent, or income columns generally appeared in sparsely populated ZIPs and were removed rather than imputed to avoid skewing the analysis.
Understanding the Data:
Most features are numeric (e.g., home values, household incomes) with categorical identifiers (State, County, City, ZIP). Zillow median estimates use a maximum of 41 listings per ZIP, so they reflect only a portion of each market.
Basic Statistical Analysis:
We computed means, medians, variances, and standard deviations to gauge central tendency and spread. Median home values average $332,143, while median household incomes average $73,632—both show wide variation and skewness across ZIP codes, indicating marked urban-rural differences.
Correlations Between Features:
Median household income correlates strongly with median home values and rent, suggesting wealthier ZIP codes see higher housing costs. Population density and migration inflows also correlate with housing demand and pricing.
Merging and Aggregating:
Data from the Census and Zillow was merged on ZIP codes for consistency. Negative or missing entries were dropped, leaving 601 valid rows. This cleaned dataset aligns with project objectives by enabling analyses of cost of living, demographic trends, and migration patterns across Colorado and Utah.
Quality of Life and Environment Trends:
Hospital Data and Health Insurance Coverage Data:
Fetch Hospital Data: Retrieves and counts hospitals per ZIP code from CMS API.
Fetch Uninsured Rate Data: Gets uninsured population per ZIP code from Census API.
Convert ZIP Codes to Coordinates: Uses pgeocode to map ZIP codes to latitude/longitude.
Merge & Process Data: Combines hospital and uninsured rate data, sorting by ZIP code.
Generate Bar Graphs:
Hospital count per ZIP (Blue/Green)
Uninsured rate per ZIP (Red/Orange)
Generate Interactive Maps:
Hospital count markers using MarkerCluster()
Uninsured rate markers with always-visible labels
More Detailed Process and Before/After Processing Dataset:
Air Pollution Data and National Park Data:
Fetch Air Quality Data: Retrieves PM2.5, PM10, and Ozone levels per monitoring site.
Fetch National Park Data: Gets park locations.
Process & Merge Data: Combines pollution and park data for analysis.
Generate Bar Graphs:
Pollution levels by state (CO & UT)
Create Interactive Maps:
Pollution monitoring sites
National park locations
More Detailed Process and Before/After Processing Dataset:
See this page about the Air Pollution Data in Colorado before processing.
See this page about the Air Pollution Data in Colorado after processing.
See this page about the Air Pollution Data in Utah before processing.
See this page about the Air Pollution Data in Utah after processing.
See this page about the National Park Data in Colorado before processing.
See this page about the National Park Data in Colorado after processing.
See this page about the National Park Data in Utah before processing.
See this page about the National Park Data in Utah after processing.
State Policies and Taxation Trends:
Scrape Tax Data:
Extracts income tax, sales tax, and property tax rates from the Tax Foundation website.
Stores data in a pandas DataFrame.
Analyze & Visualize:
Income Tax: Compares individual and corporate tax rates.
Sales Tax: Creates a stacked bar chart for state & local sales tax rates.
Property Tax: Compares property tax rates as a percentage of home value.
Saves graphs as PNG files.
More Detailed Process and Before/After Processing Dataset:
See this page.
Economic and Employment Factors
Cost of Living and Housing and Demographics and Migration Trends
Quality of Life and Environment Trends:
State Policies and Taxation Trends:
See this page about Tax Data.