This project uses datasets on used car listings and U.S. state population estimates from 2010-2019 to explore how population size affects car prices across different states. Below are links to the datasets and an initial exploration of the data.
https://www.kaggle.com/datasets/doaaalsenani/usa-cers-dataset
https://www.kaggle.com/datasets/kbrookshier/us-census-state-population-estimates-20102019
Raw Data Preview
This dataset provides annual population estimates for U.S. states from 2010 to 2019. It includes numerical values representing the population count for each year, allowing for an analysis of population trends over time. The data can be used to observe growth patterns, demographic shifts, and potential correlations with economic factors such as car pricing.
To prepare my dataset for analysis, I first merged two datasets using the state column, ensuring a comprehensive view of car data combined with population information. I then removed any duplicate rows to maintain data integrity. To handle missing values, I applied na.omit() to exclude any incomplete entries. Ensuring the accuracy of data types, I converted the year, price, mileage, and Population columns to their appropriate formats. Additionally, I created a new variable, car_age, by subtracting the model year from the current year (2025). Finally, I streamlined the dataset by removing insignificant columns such as vin and lot, focusing on the most relevant information for my analysis.
This bar chart displays the distribution of car brands across states. It provides a visual representation of how many cars of different brands are listed in each state. For example, we can observe which brands are more common in specific regions, such as Toyota or Ford being popular in certain states.
This bar chart displays the distribution of car brands across states. It provides a visual representation of how many cars of different brands are listed in each state. For example, we can observe which brands are more common in specific regions, such as Toyota or Ford being popular in certain states.
The histogram displays the frequency distribution of car prices in the dataset. It shows the number of cars within different price ranges, helping to identify if most of the cars in the dataset fall into a specific price range, such as lower-priced cars or luxury vehicles.