Gradient Accumulation

Problem Description

Implement a training step with gradient accumulation — simulating large batches with limited memory.

Signature

def accumulated_step(model, optimizer, loss_fn, micro_batches) -> float:
    # micro_batches: list of (input, target) tuples
    # Returns: average loss (float)

Algorithm

1. optimizer.zero_grad()

2. For each (x, y) in micro_batches: loss = loss_fn(model(x), y) / len(micro_batches), then loss.backward()

3. optimizer.step()

4. Return total accumulated loss

The key insight: dividing each loss by n before backward makes accumulated gradients equal to a single large-batch gradient.

Template

Implement the function below. Use only basic PyTorch operations.

# ✏️ YOUR IMPLEMENTATION HERE

def accumulated_step(model, optimizer, loss_fn, micro_batches):
    pass  # zero_grad, loop (forward, scale loss, backward), step

Test Your Implementation

Use this code to debug before submitting.

# 🧪 Debug
model = nn.Linear(4, 2)
opt = torch.optim.SGD(model.parameters(), lr=0.01)
loss = accumulated_step(model, opt, nn.MSELoss(),
    [(torch.randn(2, 4), torch.randn(2, 2)) for _ in range(4)])
print('Loss:', loss)

Reference Solution

Try solving it yourself first! Click below to reveal the solution.

# ✅ SOLUTION

def accumulated_step(model, optimizer, loss_fn, micro_batches):
    optimizer.zero_grad()
    total_loss = 0.0
    n = len(micro_batches)
    for x, y in micro_batches:
        loss = loss_fn(model(x), y) / n
        loss.backward()
        total_loss += loss.item()
    optimizer.step()
    return total_loss

Tips

Run Locally

For interactive practice with auto-grading, run TorchCode locally:
pip install torch-judge then use check("gradient_accumulation")

Key Concepts

Micro-batching, loss scaling