Convolutional Neural Networks (CNNs)

Introduction

Convolutional Neural Networks (CNNs) are designed for processing grid-like data such as images. They use convolution operations to detect local features like edges, textures, and patterns.

Key Concepts

Convolution Operation

  • Slides filters across input to detect features
  • Parameter sharing reduces overfitting
  • Translation invariance for robust detection
  • Creates feature maps highlighting patterns

Pooling Layers

  • Reduces spatial dimensions
  • Max pooling: takes maximum value
  • Average pooling: takes average value
  • Provides translation invariance

Interactive Convolution Playground

Experiment with different filters to see how convolution works. Draw on the input image and observe the effects:

Input Image (32×32)

Click to draw | Shift+click to erase | Alt+click to analyze pixel | Ctrl+click for precise

Convolution Output

Filter Selection

Detects vertical edges

Current Kernel (3×3)

-1.0
0.0
1.0
-1.0
0.0
1.0
-1.0
0.0
1.0

Parameters

Image Presets

Feature Maps Visualization

See how multiple filters create different feature maps, followed by pooling operations:

Top row: Convolution outputs | Bottom row: After pooling

CNN Architecture

Typical CNN architecture showing the flow from input through convolution, pooling, and fully connected layers:

Convolution: Feature extraction with learnable filters
Pooling: Spatial dimension reduction
Fully Connected: Classification/regression
Output: Final predictions

Receptive Field Analysis

Click on any pixel in the input to see its receptive field - the region that influences the output after convolution:

Receptive Field Visualization

Blue highlight shows the receptive field of the selected pixel

Receptive Field Details

Alt+click on the input image above to analyze a pixel's receptive field

💡 Understanding Receptive Fields

  • • Each output pixel "sees" only a local region of the input
  • • Deeper layers have larger receptive fields
  • • This locality enables CNNs to detect features hierarchically
  • • Small receptive fields detect edges, large ones detect objects

Mathematical Foundation

Convolution Operation:

(f * g)[n] = Σ f[i] × g[n - i]

Where f is the input and g is the kernel/filter

Output Size Formula:

Output = (Input + 2×Padding - Kernel) / Stride + 1

Key Takeaways

  • CNNs are specifically designed for processing spatial data like images
  • Convolution layers detect local features through learnable filters
  • Pooling layers reduce spatial dimensions and provide translation invariance
  • Parameter sharing in convolution makes CNNs efficient and robust
  • Different filters detect different types of features (edges, textures, shapes)
  • CNNs form the backbone of modern computer vision systems
  • Understanding convolution is crucial for working with image-based deep learning models