Support Vector Machines (SVM)

Introduction

Support Vector Machines (SVMs) are powerful supervised learning algorithms that find the optimal decision boundary between classes. The key insight is to maximize the margin—the distance between the decision boundary and the nearest points from each class.

SVMs are particularly effective in high-dimensional spaces and are memory efficient since they only use a subset of training points (support vectors) in the decision function.

How SVMs Work

1. Maximum Margin Principle

SVM finds the hyperplane that separates classes with the largest possible margin. This helps improve generalization to unseen data.

2. Support Vectors

Only the points closest to the decision boundary (support vectors) matter for defining the optimal hyperplane. Other points can be ignored.

3. Kernel Trick

For non-linearly separable data, SVMs use kernels to implicitly map data to higher dimensions where linear separation becomes possible.

Interactive SVM Trainer

Watch a real SVM algorithm in action! This widget implements actual SVM training using gradient descent. You can see the optimization process step-by-step, observe how support vectors are identified, and experiment with different parameters to understand their effects:

Click: add blue point (+1) • Shift+Click: add red point (-1)

Parameters

Higher C = less tolerance for misclassification

Number of training iterations

Controls

Model Statistics

Data Points: 0
Support Vectors: 0
Training Steps: 0
Accuracy: 0.0%
Margin: 0.0000

Model Weights

w₁ = 0.0000
w₂ = 0.0000
b = 0.0000

Visual Guide:

  • 🔴 Red line: Decision boundary (trained)
  • ⚫ Grey circles: Support vectors
  • -- Dashed lines: Margin boundaries (±1)
  • 🔵 Blue points: Class +1
  • 🔴 Red points: Class -1
  • 🟡 Yellow borders: Misclassified points
  • 🟢 Green borders: Correctly classified

How to Use:

  • 1. Click to add blue points (+1 class)
  • 2. Shift+click to add red points (-1 class)
  • 3. Click "Train SVM" to see optimization
  • 4. Use "Single Step" for step-by-step learning
  • 5. Adjust C parameter to see margin changes
  • 6. Try different kernels (Linear vs RBF)

Key Parameters

C (Regularization Parameter)

  • High C: Hard margin, less tolerance for misclassification
  • Low C: Soft margin, more tolerance for misclassification
  • Effect: Controls bias-variance tradeoff

Kernel Types

  • Linear: Best for linearly separable data
  • RBF: Good for non-linear patterns
  • Polynomial: Captures polynomial relationships
  • Sigmoid: Similar to neural networks

When to Use SVMs

✓ Good For:

  • • High-dimensional data (text, images)
  • • Small to medium-sized datasets
  • • Clear margin between classes
  • • Non-linear classification (with kernels)
  • • Memory-efficient requirements
  • • Robust to outliers (with appropriate C)

✗ Avoid When:

  • • Very large datasets (slow training)
  • • Many noisy features
  • • Need probability estimates
  • • Highly overlapping classes
  • • Real-time prediction requirements
  • • Interpretability is critical

Pros and Cons

ProsCons
Strong theoretical foundationSlow training on large datasets
Effective in high dimensionsSensitive to feature scaling
Memory efficient (uses support vectors)No probabilistic output
Versatile (different kernels)Difficult to interpret
Good generalizationParameter tuning required

Key Takeaways

  • SVMs find the maximum margin hyperplane for optimal generalization
  • Support vectors are the critical points that define the decision boundary
  • Kernels allow SVMs to handle non-linear classification problems
  • The C parameter controls the bias-variance tradeoff
  • SVMs work well with high-dimensional data but require feature scaling
  • Best suited for small to medium datasets with clear class separation