All Problems Description Template Solution

Gradient Accumulation

Micro-batching, loss scaling

Easy Fundamentals

Problem Description

Implement a training step with gradient accumulation โ€” simulating large batches with limited memory.

Signature

def accumulated_step(model, optimizer, loss_fn, micro_batches) -> float: # micro_batches: list of (input, target) tuples # Returns: average loss (float)

Algorithm

1. optimizer.zero_grad()

2. For each (x, y) in micro_batches: loss = loss_fn(model(x), y) / len(micro_batches), then loss.backward()

3. optimizer.step()

4. Return total accumulated loss

The key insight: dividing each loss by n before backward makes accumulated gradients equal to a single large-batch gradient.

Template

Implement the function below. Use only basic PyTorch operations.

# โœ๏ธ YOUR IMPLEMENTATION HERE def accumulated_step(model, optimizer, loss_fn, micro_batches): pass # zero_grad, loop (forward, scale loss, backward), step

Test Your Implementation

Use this code to debug before submitting.

# ๐Ÿงช Debug model = nn.Linear(4, 2) opt = torch.optim.SGD(model.parameters(), lr=0.01) loss = accumulated_step(model, opt, nn.MSELoss(), [(torch.randn(2, 4), torch.randn(2, 2)) for _ in range(4)]) print('Loss:', loss)

Reference Solution

Try solving it yourself first! Click below to reveal the solution.

# โœ… SOLUTION def accumulated_step(model, optimizer, loss_fn, micro_batches): optimizer.zero_grad() total_loss = 0.0 n = len(micro_batches) for x, y in micro_batches: loss = loss_fn(model(x), y) / n loss.backward() total_loss += loss.item() optimizer.step() return total_loss

Tips

Run Locally

For interactive practice with auto-grading, run TorchCode locally:
pip install torch-judge then use check("gradient_accumulation")

Key Concepts

Micro-batching, loss scaling

Gradient Accumulation

Description Template Test Solution Tips