Week 9
Anomaly Detection: If P( x-test) < epsilon, then flag as an anomaly. epsilon is a small number.
Used for:
Fraud detection
Manufacturing: quality control
monitoring computers at data centers
Gaussian (normal) Distribution
The probability of x taking on a value of…given mean and standard division, P (x = ? ; mu, sigma_sqr) =
parameter estimation: calculate mu and sigma based given a data set.
If labeled data, (known anomalies), non-supervised, use precision/recall/F1 score for analysis because epsilon is a small number and the prediction is skewed.
If no labeled data, supervised machine learning
Anomaly detection vs. supervised learning.
Anomaly detection: when there are very few positive predictions, fraud detection, monitoring of power grid, etc.
Supervised learning: mixed positive and negative results. classification problems, weather predictions
recommendation problem: a linear regression problem
Collaborative filtering→ lower rank matrix factorization
Implementation;
Given Data: Calculate:
Because epsilon for prediction is smaller, use F1 score to check for performance of predcition
Calculate unregularized cost:
Calculate unregularized GD
Calculated regularized Cost
Calculate regularized GD