1. Car prices by year in current US market
For your favorite car model (e.g., Ford Taurus), find the average prices for cars of this model that were manufactured in each year from 1997 to 2015 (in US). You can do this from some web sites such as cars.com, truecar.com etc. When you process the data, try to store other information about the car, such as mileage, condition etc; you may need to use this data for project 1. Or, you can simply use the data in the given example here. Or, you can use a dataset from the UC Irvine Machine Learning Repository to carry out a linear regression and answer the similar questions. The following are requirements:
a) A detailed description on how you obtain the data (be cautious on potential bias in data collection).
b) For each year, you need to find the price of at least 100 cars, and then calculate the average
(click here for example on how to extract the car price from a messy text file)
c) Produce a year-price scatter plot (click here for an example)
d) Tell during which years the dip in prices slows down. If you want to buy a used car or if you have a new
car, when would you buy or sell it? Why?
e) Carry out linear regression on average prices Vs year
Produce another scatter plot and add the regression line
Report output of the linear regression
g) Include the data (only the average prices and the years) as part of your submission
2. Global terrorism data
This is a dataset from kaggle.com, which consists of more than 150000 terrorist attacks during 1970-2015. Here is the link for the data.
a) A detailed description on the dataset.
Define major attacks are those involving causalities more than 10, 3-10 as small attacks and minor otherwise. For each of minor, small, ad major attacks, complete b-d)
b) Produce a scatter plot of year Vs number of attacks for major attacks and minor attacks, respectively.
c) Tell if there were years when there are changes in the trend of #attacks Vs Year
d) Carry out linear regression on #attacks Vs year
Produce another scatter plot and add the regression line
Report output of the linear regression.