Building a Neural Network from Scratch¶

In this notebook, we'll build a complete neural network using only NumPy. We'll implement:

  • Forward propagation with multiple layers
  • Various activation functions
  • Backpropagation algorithm
  • Mini-batch gradient descent
  • Training on MNIST digit classification

This hands-on implementation will help you understand the inner workings of neural networks.

In [14]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
import seaborn as sns
from sklearn.datasets import make_moons, make_circles
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

# Configure matplotlib
plt.style.use('seaborn-v0_8-darkgrid')
%matplotlib inline

1. Activation Functions¶

Let's implement common activation functions and their derivatives.

In [15]:
class Activations:
    @staticmethod
    def relu(x):
        """ReLU activation function."""
        return np.maximum(0, x)
    
    @staticmethod
    def relu_derivative(x):
        """Derivative of ReLU."""
        return (x > 0).astype(float)
    
    @staticmethod
    def sigmoid(x):
        """Sigmoid activation function."""
        return 1 / (1 + np.exp(-np.clip(x, -500, 500)))
    
    @staticmethod
    def sigmoid_derivative(x):
        """Derivative of sigmoid."""
        s = Activations.sigmoid(x)
        return s * (1 - s)
    
    @staticmethod
    def tanh(x):
        """Tanh activation function."""
        return np.tanh(x)
    
    @staticmethod
    def tanh_derivative(x):
        """Derivative of tanh."""
        return 1 - np.tanh(x) ** 2
    
    @staticmethod
    def softmax(x):
        """Softmax activation for multi-class classification."""
        exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
        return exp_x / np.sum(exp_x, axis=1, keepdims=True)

# Visualize activation functions
def plot_activation_functions():
    x = np.linspace(-5, 5, 100)
    
    fig, axes = plt.subplots(2, 3, figsize=(12, 8))
    
    # ReLU
    axes[0, 0].plot(x, Activations.relu(x), 'b-', linewidth=2)
    axes[0, 0].set_title('ReLU', fontweight='bold')
    axes[0, 0].grid(True, alpha=0.3)
    axes[0, 0].axhline(y=0, color='k', linewidth=0.5)
    axes[0, 0].axvline(x=0, color='k', linewidth=0.5)
    
    axes[1, 0].plot(x, Activations.relu_derivative(x), 'r-', linewidth=2)
    axes[1, 0].set_title('ReLU Derivative', fontweight='bold')
    axes[1, 0].grid(True, alpha=0.3)
    axes[1, 0].axhline(y=0, color='k', linewidth=0.5)
    axes[1, 0].axvline(x=0, color='k', linewidth=0.5)
    
    # Sigmoid
    axes[0, 1].plot(x, Activations.sigmoid(x), 'b-', linewidth=2)
    axes[0, 1].set_title('Sigmoid', fontweight='bold')
    axes[0, 1].grid(True, alpha=0.3)
    axes[0, 1].axhline(y=0, color='k', linewidth=0.5)
    axes[0, 1].axvline(x=0, color='k', linewidth=0.5)
    
    axes[1, 1].plot(x, Activations.sigmoid_derivative(x), 'r-', linewidth=2)
    axes[1, 1].set_title('Sigmoid Derivative', fontweight='bold')
    axes[1, 1].grid(True, alpha=0.3)
    axes[1, 1].axhline(y=0, color='k', linewidth=0.5)
    axes[1, 1].axvline(x=0, color='k', linewidth=0.5)
    
    # Tanh
    axes[0, 2].plot(x, Activations.tanh(x), 'b-', linewidth=2)
    axes[0, 2].set_title('Tanh', fontweight='bold')
    axes[0, 2].grid(True, alpha=0.3)
    axes[0, 2].axhline(y=0, color='k', linewidth=0.5)
    axes[0, 2].axvline(x=0, color='k', linewidth=0.5)
    
    axes[1, 2].plot(x, Activations.tanh_derivative(x), 'r-', linewidth=2)
    axes[1, 2].set_title('Tanh Derivative', fontweight='bold')
    axes[1, 2].grid(True, alpha=0.3)
    axes[1, 2].axhline(y=0, color='k', linewidth=0.5)
    axes[1, 2].axvline(x=0, color='k', linewidth=0.5)
    
    plt.suptitle('Activation Functions and Their Derivatives', fontsize=14, fontweight='bold')
    plt.tight_layout()
    plt.show()

plot_activation_functions()
No description has been provided for this image

2. Neural Network Implementation¶

In [16]:
class NeuralNetwork:
    def __init__(self, layer_sizes, activation='relu', learning_rate=0.01):
        """
        Initialize neural network.
        
        Parameters:
        - layer_sizes: list of integers representing neurons in each layer
        - activation: activation function ('relu', 'sigmoid', 'tanh')
        - learning_rate: learning rate for gradient descent
        """
        self.layer_sizes = layer_sizes
        self.n_layers = len(layer_sizes)
        self.learning_rate = learning_rate
        
        # Set activation functions
        self.activation_name = activation
        if activation == 'relu':
            self.activation = Activations.relu
            self.activation_derivative = Activations.relu_derivative
        elif activation == 'sigmoid':
            self.activation = Activations.sigmoid
            self.activation_derivative = Activations.sigmoid_derivative
        elif activation == 'tanh':
            self.activation = Activations.tanh
            self.activation_derivative = Activations.tanh_derivative
        
        # Initialize weights and biases
        self.weights = []
        self.biases = []
        
        for i in range(1, self.n_layers):
            # He initialization for ReLU, Xavier for others
            if activation == 'relu':
                w = np.random.randn(layer_sizes[i-1], layer_sizes[i]) * np.sqrt(2.0 / layer_sizes[i-1])
            else:
                w = np.random.randn(layer_sizes[i-1], layer_sizes[i]) * np.sqrt(1.0 / layer_sizes[i-1])
            
            b = np.zeros((1, layer_sizes[i]))
            self.weights.append(w)
            self.biases.append(b)
        
        # Storage for training history
        self.loss_history = []
        self.accuracy_history = []
    
    def forward_propagation(self, X):
        """
        Perform forward propagation.
        
        Returns:
        - activations: list of activations for each layer
        - z_values: list of weighted sums for each layer
        """
        activations = [X]
        z_values = []
        
        for i in range(self.n_layers - 1):
            z = activations[-1] @ self.weights[i] + self.biases[i]
            z_values.append(z)
            
            # Use softmax for output layer in multi-class classification
            if i == self.n_layers - 2 and self.layer_sizes[-1] > 1:
                a = Activations.softmax(z)
            # Use sigmoid for binary classification output
            elif i == self.n_layers - 2 and self.layer_sizes[-1] == 1:
                a = Activations.sigmoid(z)
            else:
                a = self.activation(z)
            
            activations.append(a)
        
        return activations, z_values
    
    def compute_loss(self, y_true, y_pred):
        """
        Compute loss based on problem type.
        """
        m = y_true.shape[0]
        
        # Binary cross-entropy for binary classification
        if self.layer_sizes[-1] == 1:
            y_pred = np.clip(y_pred, 1e-7, 1 - 1e-7)
            loss = -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
        # Categorical cross-entropy for multi-class
        else:
            y_pred = np.clip(y_pred, 1e-7, 1 - 1e-7)
            loss = -np.mean(np.sum(y_true * np.log(y_pred), axis=1))
        
        return loss
    
    def backward_propagation(self, X, y, activations, z_values):
        """
        Perform backward propagation and update weights.
        """
        m = X.shape[0]
        
        # Initialize gradients storage
        dw_list = [None] * (self.n_layers - 1)
        db_list = [None] * (self.n_layers - 1)
        
        # Calculate output layer gradient
        if self.layer_sizes[-1] == 1:
            # Binary classification
            dz = activations[-1] - y
        else:
            # Multi-class classification
            dz = activations[-1] - y
        
        # Backpropagate through layers
        for layer_idx in range(self.n_layers - 2, -1, -1):
            # Calculate weight and bias gradients for this layer
            dw = (1/m) * (activations[layer_idx].T @ dz)
            db = (1/m) * np.sum(dz, axis=0, keepdims=True)
            
            # Store gradients
            dw_list[layer_idx] = dw
            db_list[layer_idx] = db
            
            # Calculate gradient for previous layer (if not at input layer)
            if layer_idx > 0:
                # Propagate gradient backwards through current weights
                da = dz @ self.weights[layer_idx].T
                # Apply activation derivative from the previous layer
                dz = da * self.activation_derivative(z_values[layer_idx - 1])
        
        # Update all weights and biases
        for i in range(self.n_layers - 1):
            self.weights[i] -= self.learning_rate * dw_list[i]
            self.biases[i] -= self.learning_rate * db_list[i]
    
    def train(self, X, y, epochs=100, batch_size=32, validation_split=0.2, verbose=True):
        """
        Train the neural network.
        """
        # Split data for validation
        n_samples = X.shape[0]
        n_val = int(n_samples * validation_split)
        
        indices = np.random.permutation(n_samples)
        X_train = X[indices[n_val:]]
        y_train = y[indices[n_val:]]
        X_val = X[indices[:n_val]]
        y_val = y[indices[:n_val]]
        
        n_train = X_train.shape[0]
        
        for epoch in range(epochs):
            # Shuffle training data
            indices = np.random.permutation(n_train)
            X_train = X_train[indices]
            y_train = y_train[indices]
            
            # Mini-batch training
            for i in range(0, n_train, batch_size):
                batch_X = X_train[i:i+batch_size]
                batch_y = y_train[i:i+batch_size]
                
                # Forward propagation
                activations, z_values = self.forward_propagation(batch_X)
                
                # Backward propagation
                self.backward_propagation(batch_X, batch_y, activations, z_values)
            
            # Calculate validation metrics
            val_activations, _ = self.forward_propagation(X_val)
            val_loss = self.compute_loss(y_val, val_activations[-1])
            
            if self.layer_sizes[-1] == 1:
                val_predictions = (val_activations[-1] > 0.5).astype(int)
                val_accuracy = np.mean(val_predictions == y_val)
            else:
                val_predictions = np.argmax(val_activations[-1], axis=1)
                val_accuracy = np.mean(val_predictions == np.argmax(y_val, axis=1))
            
            self.loss_history.append(val_loss)
            self.accuracy_history.append(val_accuracy)
            
            if verbose and epoch % 10 == 0:
                print(f"Epoch {epoch:3d} - Loss: {val_loss:.4f}, Accuracy: {val_accuracy:.4f}")
    
    def predict(self, X):
        """Make predictions."""
        activations, _ = self.forward_propagation(X)
        
        if self.layer_sizes[-1] == 1:
            return (activations[-1] > 0.5).astype(int)
        else:
            return np.argmax(activations[-1], axis=1)

print("Neural Network class defined successfully!")
Neural Network class defined successfully!

3. Test on Simple Classification Problem¶

In [17]:
# Generate moon dataset
X, y = make_moons(n_samples=1000, noise=0.1, random_state=42)
y = y.reshape(-1, 1)

# Normalize features
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Create and train neural network
nn = NeuralNetwork(
    layer_sizes=[2, 16, 8, 1],  # 2 input, 2 hidden layers, 1 output
    activation='relu',
    learning_rate=0.1
)

print("Training neural network on moon dataset...")
nn.train(X, y, epochs=100, batch_size=32, verbose=False)
print(f"Final accuracy: {nn.accuracy_history[-1]:.4f}")
Training neural network on moon dataset...
Final accuracy: 1.0000

4. Visualize Decision Boundary and Training Progress¶

In [18]:
def plot_decision_boundary(X, y, model, title="Decision Boundary"):
    """Plot decision boundary for 2D data."""
    h = 0.02
    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    plt.figure(figsize=(10, 6))
    plt.contourf(xx, yy, Z, cmap=plt.cm.RdBu, alpha=0.8)
    
    # Plot data points
    scatter = plt.scatter(X[:, 0], X[:, 1], c=y.ravel(), cmap=plt.cm.RdBu, 
                         edgecolor='black', linewidth=0.5)
    
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.title(title, fontsize=14, fontweight='bold')
    plt.colorbar(scatter)
    plt.tight_layout()
    plt.show()

# Plot decision boundary
plot_decision_boundary(X, y, nn, "Neural Network Decision Boundary")

# Plot training history
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

ax1.plot(nn.loss_history, 'b-', linewidth=2)
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.set_title('Training Loss', fontweight='bold')
ax1.grid(True, alpha=0.3)

ax2.plot(nn.accuracy_history, 'g-', linewidth=2)
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Accuracy')
ax2.set_title('Validation Accuracy', fontweight='bold')
ax2.grid(True, alpha=0.3)

plt.suptitle('Training Progress', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()
No description has been provided for this image
No description has been provided for this image

5. Visualize Network Architecture¶

In [19]:
def visualize_network_architecture(nn):
    """Visualize the neural network architecture."""
    fig, ax = plt.subplots(figsize=(12, 8))
    
    layer_sizes = nn.layer_sizes
    n_layers = len(layer_sizes)
    
    # Calculate positions
    layer_positions = np.linspace(0.1, 0.9, n_layers)
    
    # Draw neurons and connections
    for layer_idx in range(n_layers):
        layer_size = layer_sizes[layer_idx]
        x_pos = layer_positions[layer_idx]
        
        # Calculate y positions for neurons in this layer
        y_positions = np.linspace(0.1, 0.9, layer_size)
        
        # Draw neurons
        for neuron_idx in range(layer_size):
            y_pos = y_positions[neuron_idx]
            
            # Draw connections to next layer
            if layer_idx < n_layers - 1:
                next_layer_size = layer_sizes[layer_idx + 1]
                next_x_pos = layer_positions[layer_idx + 1]
                next_y_positions = np.linspace(0.1, 0.9, next_layer_size)
                
                for next_neuron_idx in range(next_layer_size):
                    next_y_pos = next_y_positions[next_neuron_idx]
                    
                    # Get weight value
                    weight = nn.weights[layer_idx][neuron_idx, next_neuron_idx]
                    
                    # Color based on weight value
                    color = 'blue' if weight > 0 else 'red'
                    alpha = min(abs(weight) / 2, 0.8)
                    
                    ax.plot([x_pos, next_x_pos], [y_pos, next_y_pos], 
                           color=color, alpha=alpha, linewidth=0.5)
            
            # Draw neuron
            circle = plt.Circle((x_pos, y_pos), 0.02, 
                               color='darkblue', zorder=10)
            ax.add_patch(circle)
        
        # Add layer label
        if layer_idx == 0:
            label = f'Input\n({layer_size} neurons)'
        elif layer_idx == n_layers - 1:
            label = f'Output\n({layer_size} neurons)'
        else:
            label = f'Hidden {layer_idx}\n({layer_size} neurons)'
        
        ax.text(x_pos, 0.02, label, ha='center', fontsize=10)
    
    ax.set_xlim(0, 1)
    ax.set_ylim(0, 1)
    ax.set_aspect('equal')
    ax.axis('off')
    ax.set_title('Neural Network Architecture\n(Blue: positive weights, Red: negative weights)', 
                fontsize=14, fontweight='bold')
    
    plt.tight_layout()
    plt.show()

visualize_network_architecture(nn)
No description has been provided for this image

6. Weight Visualization and Analysis¶

In [20]:
def visualize_weights(nn):
    """Visualize weight matrices as heatmaps."""
    n_weight_matrices = len(nn.weights)
    
    fig, axes = plt.subplots(1, n_weight_matrices, figsize=(4*n_weight_matrices, 4))
    
    if n_weight_matrices == 1:
        axes = [axes]
    
    for idx, (weights, ax) in enumerate(zip(nn.weights, axes)):
        im = ax.imshow(weights, cmap='coolwarm', aspect='auto')
        ax.set_title(f'Layer {idx+1} Weights\nShape: {weights.shape}')
        ax.set_xlabel('To neuron')
        ax.set_ylabel('From neuron')
        plt.colorbar(im, ax=ax)
    
    plt.suptitle('Weight Matrices Visualization', fontsize=14, fontweight='bold')
    plt.tight_layout()
    plt.show()

visualize_weights(nn)

# Weight statistics
print("\nWeight Statistics:")
for idx, weights in enumerate(nn.weights):
    print(f"Layer {idx+1}:")
    print(f"  Shape: {weights.shape}")
    print(f"  Mean: {weights.mean():.4f}")
    print(f"  Std: {weights.std():.4f}")
    print(f"  Min: {weights.min():.4f}")
    print(f"  Max: {weights.max():.4f}")
No description has been provided for this image
Weight Statistics:
Layer 1:
  Shape: (2, 16)
  Mean: -0.1511
  Std: 1.0939
  Min: -2.4372
  Max: 1.8965
Layer 2:
  Shape: (16, 8)
  Mean: 0.0186
  Std: 0.4826
  Min: -1.4957
  Max: 1.4920
Layer 3:
  Shape: (8, 1)
  Mean: 0.0808
  Std: 1.5586
  Min: -2.3508
  Max: 3.1191

7. Test on More Complex Dataset¶

In [21]:
# Generate a more complex dataset (circles)
X_complex, y_complex = make_circles(n_samples=1000, noise=0.1, factor=0.5, random_state=42)
y_complex = y_complex.reshape(-1, 1)

# Normalize
X_complex = StandardScaler().fit_transform(X_complex)

# Create deeper network
nn_complex = NeuralNetwork(
    layer_sizes=[2, 32, 16, 8, 1],  # Deeper network
    activation='relu',
    learning_rate=0.1
)

print("Training on circles dataset...")
nn_complex.train(X_complex, y_complex, epochs=150, batch_size=32, verbose=False)
print(f"Final accuracy: {nn_complex.accuracy_history[-1]:.4f}")

# Visualize results
plot_decision_boundary(X_complex, y_complex, nn_complex, "Complex Dataset Decision Boundary")
Training on circles dataset...
Final accuracy: 0.9850
No description has been provided for this image

8. Activation Patterns Analysis¶

In [22]:
def analyze_activation_patterns(nn, X, n_samples=5):
    """Analyze and visualize activation patterns for sample inputs."""
    # Get random samples
    indices = np.random.choice(X.shape[0], n_samples, replace=False)
    X_samples = X[indices]
    
    # Get activations for each sample
    fig, axes = plt.subplots(n_samples, len(nn.layer_sizes), 
                            figsize=(3*len(nn.layer_sizes), 2*n_samples))
    
    for sample_idx, x in enumerate(X_samples):
        activations, _ = nn.forward_propagation(x.reshape(1, -1))
        
        for layer_idx, activation in enumerate(activations):
            ax = axes[sample_idx, layer_idx] if n_samples > 1 else axes[layer_idx]
            
            # Reshape activation for visualization
            act_values = activation.flatten()
            
            # Create bar plot
            ax.bar(range(len(act_values)), act_values, color='blue', alpha=0.7)
            ax.set_ylim([0, 1.1])
            ax.set_xlabel('Neuron')
            ax.set_ylabel('Activation')
            
            if sample_idx == 0:
                if layer_idx == 0:
                    ax.set_title(f'Input Layer')
                elif layer_idx == len(nn.layer_sizes) - 1:
                    ax.set_title(f'Output Layer')
                else:
                    ax.set_title(f'Hidden Layer {layer_idx}')
    
    plt.suptitle('Activation Patterns for Sample Inputs', fontsize=14, fontweight='bold')
    plt.tight_layout()
    plt.show()

analyze_activation_patterns(nn, X, n_samples=3)
No description has been provided for this image

9. Compare Different Activation Functions¶

In [23]:
def compare_activation_functions(X, y):
    """Compare performance of different activation functions."""
    activations = ['relu', 'sigmoid', 'tanh']
    results = {}
    
    fig, axes = plt.subplots(1, 3, figsize=(15, 4))
    
    for idx, activation in enumerate(activations):
        # Train network
        nn = NeuralNetwork(
            layer_sizes=[2, 16, 8, 1],
            activation=activation,
            learning_rate=0.1
        )
        nn.train(X, y, epochs=100, batch_size=32, verbose=False)
        
        results[activation] = {
            'final_accuracy': nn.accuracy_history[-1],
            'final_loss': nn.loss_history[-1],
            'model': nn
        }
        
        # Plot training curves
        axes[idx].plot(nn.accuracy_history, linewidth=2)
        axes[idx].set_xlabel('Epoch')
        axes[idx].set_ylabel('Accuracy')
        axes[idx].set_title(f'{activation.upper()} Activation')
        axes[idx].grid(True, alpha=0.3)
        axes[idx].set_ylim([0.5, 1.0])
    
    plt.suptitle('Activation Function Comparison', fontsize=14, fontweight='bold')
    plt.tight_layout()
    plt.show()
    
    # Print results
    print("\nActivation Function Performance:")
    print("-" * 50)
    for activation, metrics in results.items():
        print(f"{activation.upper():8s} - Accuracy: {metrics['final_accuracy']:.4f}, Loss: {metrics['final_loss']:.4f}")

compare_activation_functions(X, y)
No description has been provided for this image
Activation Function Performance:
--------------------------------------------------
RELU     - Accuracy: 1.0000, Loss: 0.0036
SIGMOID  - Accuracy: 0.8700, Loss: 0.2628
TANH     - Accuracy: 1.0000, Loss: 0.0052

10. Mini MNIST-like Example¶

In [24]:
# Create a simple digit classification problem (simplified MNIST)
from sklearn.datasets import load_digits

# Load digits dataset
digits = load_digits(n_class=5)  # Only use first 5 digits
X_digits = digits.data / 16.0  # Normalize to [0, 1]
y_digits = digits.target

# Convert to one-hot encoding
y_digits_onehot = np.zeros((y_digits.shape[0], 5))
y_digits_onehot[np.arange(y_digits.shape[0]), y_digits] = 1

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X_digits, y_digits_onehot, test_size=0.2, random_state=42
)

# Create network for digit classification
nn_digits = NeuralNetwork(
    layer_sizes=[64, 32, 16, 5],  # 64 input features, 5 output classes
    activation='relu',
    learning_rate=0.1
)

print("Training on digits dataset (0-4)...")
nn_digits.train(X_train, y_train, epochs=200, batch_size=32, verbose=False)

# Test accuracy
predictions = nn_digits.predict(X_test)
true_labels = np.argmax(y_test, axis=1)
test_accuracy = np.mean(predictions == true_labels)

print(f"Test Accuracy: {test_accuracy:.4f}")

# Visualize some predictions
fig, axes = plt.subplots(2, 5, figsize=(12, 5))

for i in range(10):
    ax = axes[i // 5, i % 5]
    
    # Reshape back to 8x8 image
    image = X_test[i].reshape(8, 8)
    ax.imshow(image, cmap='gray')
    
    pred = predictions[i]
    true = true_labels[i]
    
    color = 'green' if pred == true else 'red'
    ax.set_title(f'Pred: {pred}, True: {true}', color=color)
    ax.axis('off')

plt.suptitle('Digit Classification Results', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

# Plot confusion matrix
from sklearn.metrics import confusion_matrix

cm = confusion_matrix(true_labels, predictions)
plt.figure(figsize=(6, 5))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix', fontweight='bold')
plt.show()
Training on digits dataset (0-4)...
Test Accuracy: 0.9945
No description has been provided for this image
No description has been provided for this image

Summary and Conclusions¶

In this notebook, we've built a complete neural network from scratch using only NumPy. Key accomplishments:

What We Implemented:¶

  1. Forward Propagation: Matrix multiplication through layers with activation functions
  2. Backpropagation: Gradient calculation and weight updates
  3. Multiple Activation Functions: ReLU, Sigmoid, Tanh with their derivatives
  4. Mini-batch Gradient Descent: Efficient training on batches
  5. Different Loss Functions: Binary and categorical cross-entropy

Key Insights:¶

  • Weight Initialization: He initialization for ReLU, Xavier for others
  • Activation Functions: ReLU generally performs best for hidden layers
  • Network Depth: Deeper networks can learn more complex patterns
  • Learning Rate: Critical hyperparameter affecting convergence

Visualizations Created:¶

  • Decision boundaries showing how the network separates classes
  • Weight matrices revealing learned patterns
  • Activation patterns showing information flow
  • Training curves demonstrating convergence

Next Steps:¶

  • Implement advanced optimizers (Adam, RMSprop)
  • Add dropout and batch normalization
  • Implement convolutional layers
  • Try on real-world datasets

This implementation helps understand the fundamental mathematics behind deep learning frameworks like TensorFlow and PyTorch!