Understanding Gradient Descent

December 28, 2025 1 min read

A quick exploration of gradient descent - how it works and why it's the backbone of modern ML.

Problem

I wanted to deeply understand how gradient descent actually works under the hood, not just use it as a black box in PyTorch or TensorFlow.

Blocker

The math looked intimidating at first - partial derivatives, learning rates, convergence… Where do I even start?

Solution

The core idea is beautifully simple: take small steps in the direction that reduces error the most.

# Simple gradient descent implementation
def gradient_descent(x, learning_rate=0.01, epochs=100):
    for _ in range(epochs):
        gradient = compute_gradient(x)  # derivative of loss w.r.t. x
        x = x - learning_rate * gradient  # update in opposite direction
    return x

# For a simple quadratic: f(x) = x^2
# Gradient: f'(x) = 2x
# Starting at x=10, learning_rate=0.1
# x_new = 10 - 0.1 * 20 = 8
# x_new = 8 - 0.1 * 16 = 6.4
# ... converges to 0 (the minimum)

Key insight: The gradient tells us the direction of steepest increase, so we go the opposite way to minimize.

Resources

Next Steps

  • Implement batch vs stochastic gradient descent
  • Explore momentum and Adam optimizer
  • Apply to a real neural network from scratch