Lab 5

Lab guideline

1. Submit everything (including R code and resulting plots if any) as a SINGLE pdf document with file named as lastname.firstname_Lab5.pdf. Other formats are not accepted.

2. Each of the 3 clustering methods is worth 30 points, and clarity of overall report 10 points.

3. For each of the 3 clustering methods, answer all the questions.

Clustering of the US Arrest data

1. Use the following R command to obtain the US Arrests data

>data(USArrests);

2. Try K-means clustering on the US Arrests data with different number, k, of clusters.

a) Describe your findings at different k

b) In your opinion, which k works the best? Please give your explanation.

c) Plot your clustering results at the `best' k (you can choose to plot the data on the two variables that give the best visualization, or use the top 2 principal components if you like).

3. Try agglomerative clustering on the US Arrests data.

a) Plot the dendrogram

b) Tell what is the optimum height to cut the dendrogram for clustering. Does this agree with your result on 2 b)?

4. Try divisive clustering on the US Arrests data.

a) Plot the dendrogram

b) Tell what is the optimum height to cut the dendrogram for clustering. Does this agree with your result on 3 b)?

Sample code on k-means clustering, agglomeartive, and divisive clustering can be found from the lecture notes. To start, a sample of R code is here.

Page updated

Google Sites

Report abuse