Week 09 - [ 7/15- 7/22 ]

Completed:

Coursera Week 8

EX 7

Coursera Week 9 part 1

Lessons Learned:

K Means algorithm

1. nitialize cluster centroids
2. classify data based on proximity to centroids
3. find the center of each "cluster"
4. move centroids to the mean of the classified groups
5. repeat

K - the number of clusters

Application:

- well separated cluster
- not well separated, i.e.height /weight → t-shirt size S/M/L

Optimization Objective function (distortion)

Choose number of clusters - elbow method

Dimensionality reduction.

Data Compression

Data Visualization

Principal Component Analysis (PCA)

- Identify a lower dimension surface on which to project data s.t. sum of square error b/w actual data and data on surface (projection error) is minimal
- PCA is NOT Linear Regression
- Should NOT be used for overfitting, use regularization instead.

Principal COmponent Analysis (PCA) Algorithm

- Data processing: feature scaling/ mean normalization
- Compute covariance matrix (n X n) → sigma
- Compute eigenvector (single value decomposition (SVD)) → [U,S,V]
- First K columns of U matrix is Ureduced
- z = trans(Ureduced) * x

Reconstruction from compressed representation: Xapprox = trans(Ureduced) *Z

K - number of principal components

K should be the smallest value such that:

average squared projection error / total variation in data should be less than 0.01

→ 99% of variance is retained

Use PCA to speed up supervised learning: maps x to z

- Given dataset (x1, y1), …(xm, ym)
- Extract input: (x1...xm)
- apply PCA to obtain (z1...zm)
- new training set (z1, y1), (z2, y2)..(zm, ym)

randi(range, size of matrix) → randi(10,5): a 5 X 5 matrix of integers ranging from 1-10

Page updated

Report abuse