Assignments IV: Data Wrangling

In this assignment, we will be using cars.csv dataset.

The main goal of this exercise is to see whether we can group and cluster the origin of the car based on its attributes.

Since the running time can be quite long, try to run the code one or two number at a time.

Read the csv file then set the first column as the index (use index_col). Then, save all the cars attributes other than "Origin" in a new variable called X. Save "Origin" in another new variable called y.
Perform SVD on matrix X. Then, print out the matrix U, s, and Vh. Try to delete all the residual data (anything less than 1%), save it in a new variable, then reconstruct the data using the data without residual. Print out the result.
Perform Factor Analysis on X. Print the result of it, then determine the right number of components.
Perform PCA on X. Print the result of it then guess the number of optimal components from the result.
Perform the t-SNE algorithm on X. Set the initial perplexity, early_exaggeration and n_iter appropriately, then plot 2D graph of the result.
Perform k-clustering with k = 3 on X. (Try to do PCA and scaling on the data ) Print out the cross-tabulation with the original y.
Perform DBScan on X and try to adjust the eps and min_sample accordingly. (Try to do PCA and scaling on the data beforehand.)
Perform agglomerative clustering on X and set the appropriate variables. Then, print out the cross-tabulation with the original y.

Page updated

Google Sites

Report abuse