The most current used-car listings from Craigslist were scraped and uploaded to Kaggle, where they were utilized to create this dataset. This dataset provides enough rows for analysis because Craigslist has the most used automobile listings worldwide. In this way, we can obtain insights that truly reflect the trend of the used car market. Every few months, this Kaggle dataset updates new data from Craigslist. To ensure that this project has the most recent data for analysis, the latest dataset was downloaded for this investigation.
This dataset contains 426,880 entries of listing information and 26 columns: id, url, region, region url, price, year, manufacturer, model, condition, cylinders, fuel, odometer, title_status, transmission, VIN, drive, size, type, paint_color, image_url, description, country, state, posting_date. With this study, it was discovered that some columns are irrelevant, and others contain invalid (NaN) values mostly; hence, these columns will be deleted. This dataset contains several outliers that we will also need to deal with. In the later section, we will go into further detail.