Loss Functions & Optimization

Introduction

Loss functions measure how wrong our predictions are, while optimization algorithms help us minimize this loss. Together, they form the core of how machine learning models learn.

Interactive Loss Function Comparison

Explore different loss functions and see how they penalize prediction errors differently:

Select Loss Function

Current Values

Loss: 0.1600
Gradient: -0.8000
The red arrow shows the direction to move the prediction to reduce loss

Loss Landscape & Gradient Descent

Visualize how gradient descent navigates a 2D loss landscape to find the minimum:

Loss Landscape (Rosenbrock Function)
Low loss
High loss

Optimization Controls

Current Position

w₁ = -1.500
w₂ = 1.500
Loss = 9.0625

Tips:

  • Small learning rates: slow but stable
  • Large learning rates: fast but may overshoot
  • The path shows how gradient descent navigates the loss surface
  • Notice how it follows the steepest descent direction

Common Loss Functions

Regression Losses

  • MSE: Standard for regression. Sensitive to outliers.
  • MAE: Robust to outliers. Less smooth gradient.
  • Huber: Best of both worlds. Smooth + robust.

Classification Losses

  • Cross-Entropy: Standard for classification. Well-suited for probabilities.
  • Hinge Loss: Used in SVMs. Creates margin.
  • Focal Loss: Addresses class imbalance.

Interactive Loss Function Explorer

Select Loss Function:

Mean Squared Error (MSE)

Formula:
L(y, ŷ) = (y - ŷ)²
Description:

Squares the difference between predicted and actual values. Heavily penalizes large errors, making it sensitive to outliers. Most common for regression tasks.

When to use:

Used in linear regression, neural networks for continuous predictions

Key Takeaways

  • Loss functions quantify how wrong our predictions are
  • Different loss functions are suited for different problems
  • Gradient descent uses the derivative to find the direction of steepest descent
  • Learning rate controls the step size in optimization
  • The loss landscape can have multiple local minima
  • Understanding loss functions and optimization is crucial for debugging ML models