The k-nearest neighbors (KNN) algorithm is a simple, supervised machine learningalgorithm that can be used to solve both classification and regression problems. It's easy to implement and understand. If you are interested to know its role in current data science, then this document helps.
The KNN algorithm assumes that similar things exist in close proximity. In other words, similar things are near to each other.
It may happen that for a given point, K-nn finds that more than one class have same number of points. Which class should be selected then?
Considering that number of points in a class goes to infinite as total number of points goes to infinite, the approach would be to increase K by 1 if tie is encountered.
Optimal K value depends on the dataset. Normally used approach is cross-validation. Also it is better to use odd value of K considering tie problem mentioned above.
Bayes decision rule based analysis helps to understand this problem in detail. Via mathematics, probability density(PDF) of the population is derived and based on this theorem, Bayes decision rule can be applied. Note that PDF is calculated via theorem and class prior probability can be derived using the n points and so we have all pre-condition to apply K-nn Bayes decision rule.
Value of k is hyperparameter and should be tuned using GridSearch or other search approaches
The error rate at K=1 is always zero for the training sample. This is because the closest point to any training data point is itself.
However, kNN with k=1 in general implies over-fitting, or in most cases leads to over-fitting. Note that you estimate your probability based on a single sample: your closest neighbor. This is very sensitive to all sort of distortions like noise, outliers, mislabelling of data, and so on. By using a higher value for k, you tend to be more robust against those distortions.
Use right distance metric. Mahanobis distance is a good choice. Refer the article here
Neural networks have achieved the state of the art in more domains than k-NN. For some cases, k-NN still gives better accuracy.
It is popular for running ML in edge devices(IOT) since it is simple and have good predictive power
https://youtu.be/DlQli0OCkf8
https://youtu.be/DlQli0OCkf8?t=1970
http://faculty.washington.edu/yenchic/18W_425/Lec7_knn_basis.pdf
https://discuss.analyticsvidhya.com/t/how-to-choose-the-value-of-k-in-knn-algorithm/2606/7
https://stats.stackexchange.com/questions/107870/does-k-nn-with-k-1-always-implies-overfitting
https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/role-of-normalisation-in-machine-learning#TOC-Case-when-feature-scaling-is-not-needed
https://towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors-algorithm-6a6e71d01761