K-Nearest Neighbors (KNN)
Introduction
K-Nearest Neighbors (KNN) is one of the simplest machine learning algorithms. It classifies new data points based on the majority class among the k nearest neighbors in the training set. Despite its simplicity, KNN can be surprisingly effective for many real-world problems.
KNN is a "lazy learning" algorithm because it doesn't build an explicit model during training. Instead, it stores all training data and makes predictions by examining the local neighborhood of each query point.
How KNN Works
1. Store Training Data
KNN simply stores all training examples without building an explicit model.
2. Calculate Distances
For a new query point, calculate distances to all training points using metrics like Euclidean or Manhattan distance.
3. Find K Nearest Neighbors
Select the k training points with the smallest distances to the query point.
4. Make Prediction
For classification: majority vote among k neighbors. For regression: average of k neighbors' values.
Interactive KNN Classifier
Click anywhere on the plot to place a query point and see how KNN classifies it:
Click anywhere to place a query point (black circle)
Right-click to add new training points
Parameters
Prediction
Controls
Legend:
- 🔴 Red: Class 0
- 🔵 Blue: Class 1
- 🟢 Green: Class 2
- âš« Black: Query point
- 🟡 Yellow border: Selected neighbors
- â• Circles: Distance boundaries
Choosing K
K = 1
- • Very sensitive to noise
- • Complex decision boundaries
- • High variance, low bias
- • Prone to overfitting
Optimal K
- • Usually odd numbers (avoid ties)
- • Often √n where n = training size
- • Found through cross-validation
- • Balances bias and variance
Large K
- • Smooth decision boundaries
- • Less sensitive to noise
- • Low variance, higher bias
- • Risk of underfitting
Distance Metrics
Metric | Formula | Use Case | Properties |
---|---|---|---|
Euclidean | √Σ(xi - yi)² | Continuous features, geometric data | Straight-line distance, sensitive to scale |
Manhattan | Σ|xi - yi| | High-dimensional data, city-block distances | Less sensitive to outliers |
Cosine | 1 - (A·B)/(||A||||B||) | Text classification, sparse data | Measures angle, not magnitude |
Hamming | Σ(xi ≠yi) | Categorical data, binary strings | Counts differing positions |
When to Use KNN
✓ Good For:
- • Non-linear decision boundaries
- • Local patterns in data
- • Small to medium datasets
- • Multi-class classification
- • Recommendation systems
- • Anomaly detection
- • When data distribution is unknown
✗ Avoid When:
- • High-dimensional data (curse of dimensionality)
- • Large datasets (slow prediction)
- • Real-time applications
- • Imbalanced datasets
- • Features have very different scales
- • Sparse data
- • Need interpretable models
Pros and Cons
Pros | Cons |
---|---|
Simple to understand and implement | Computationally expensive at prediction time |
No assumptions about data distribution | Sensitive to irrelevant features |
Works well with small datasets | Suffers from curse of dimensionality |
Can be used for both classification and regression | Sensitive to local structure of data |
Adapts to new data automatically | Memory intensive (stores all training data) |
Key Takeaways
- KNN is a simple, non-parametric algorithm that makes no assumptions about data distribution
- The choice of k is crucial: small k leads to overfitting, large k leads to underfitting
- Feature scaling is essential since KNN relies on distance calculations
- Distance metric selection depends on your data type and problem domain
- KNN is computationally expensive at prediction time but requires no training
- Works well for local patterns but struggles with high-dimensional data