Loss Functions & Optimization
Introduction
Loss functions measure how wrong our predictions are, while optimization algorithms help us minimize this loss. Together, they form the core of how machine learning models learn.
Interactive Loss Function Comparison
Explore different loss functions and see how they penalize prediction errors differently:
Select Loss Function
Current Values
Loss: 0.1600
Gradient: -0.8000
The red arrow shows the direction to move the prediction to reduce loss
Loss Landscape & Gradient Descent
Visualize how gradient descent navigates a 2D loss landscape to find the minimum:
Loss Landscape (Rosenbrock Function)
Low loss
High loss
Optimization Controls
Current Position
w₁ = -1.500
w₂ = 1.500
Loss = 9.0625
Tips:
- Small learning rates: slow but stable
- Large learning rates: fast but may overshoot
- The path shows how gradient descent navigates the loss surface
- Notice how it follows the steepest descent direction
Common Loss Functions
Regression Losses
- MSE: Standard for regression. Sensitive to outliers.
- MAE: Robust to outliers. Less smooth gradient.
- Huber: Best of both worlds. Smooth + robust.
Classification Losses
- Cross-Entropy: Standard for classification. Well-suited for probabilities.
- Hinge Loss: Used in SVMs. Creates margin.
- Focal Loss: Addresses class imbalance.
Interactive Loss Function Explorer
Select Loss Function:
Mean Squared Error (MSE)
Formula:
L(y, ŷ) = (y - ŷ)²
Description:
Squares the difference between predicted and actual values. Heavily penalizes large errors, making it sensitive to outliers. Most common for regression tasks.
When to use:
Used in linear regression, neural networks for continuous predictions
Key Takeaways
- Loss functions quantify how wrong our predictions are
- Different loss functions are suited for different problems
- Gradient descent uses the derivative to find the direction of steepest descent
- Learning rate controls the step size in optimization
- The loss landscape can have multiple local minima
- Understanding loss functions and optimization is crucial for debugging ML models