All Problems Description Template Solution

RoPE

Rotary position embedding, relative position via rotation

Hard Attention

Problem Description

Implement RoPE — the position encoding used in LLaMA, GPT-NeoX, and most modern LLMs.

Signature

def apply_rope(q: Tensor, k: Tensor) -> tuple[Tensor, Tensor]: # q, k: (B, S, D) where D is even # Returns rotated (q, k) with same shape

Key Idea

Split each vector into consecutive pairs. Rotate each pair by θ = pos / 10000^(2i/D):

[x_0, x_1] → [x_0*cosθ - x_1*sinθ, x_0*sinθ + x_1*cosθ]

This makes dot(q_rot[i], k_rot[j]) depend only on i - j (relative position).

Template

Implement the function below. Use only basic PyTorch operations.

# ✏️ YOUR IMPLEMENTATION HERE def apply_rope(q, k): # 1. Compute position angles # 2. Split into even/odd pairs # 3. Apply rotation pass

Test Your Implementation

Use this code to debug before submitting.

# 🧪 Debug q = torch.randn(1, 8, 16) k = torch.randn(1, 8, 16) qr, kr = apply_rope(q, k) print('Shape preserved:', qr.shape == q.shape) print('Norm preserved:', torch.allclose(q.norm(dim=-1), qr.norm(dim=-1), atol=1e-4))

Reference Solution

Try solving it yourself first! Click below to reveal the solution.

# ✅ SOLUTION def apply_rope(q, k): B, S, D = q.shape pos = torch.arange(S, device=q.device).unsqueeze(1).float() dim = torch.arange(0, D, 2, device=q.device).float() freqs = 1.0 / (10000.0 ** (dim / D)) angles = pos * freqs cos_a = torch.cos(angles) sin_a = torch.sin(angles) def rotate(x): x1, x2 = x[..., 0::2], x[..., 1::2] return torch.stack([x1 * cos_a - x2 * sin_a, x1 * sin_a + x2 * cos_a], dim=-1).flatten(-2) return rotate(q), rotate(k)

Tips

Run Locally

For interactive practice with auto-grading, run TorchCode locally:
pip install torch-judge then use check("rope")

Key Concepts

Rotary position embedding, relative position via rotation

RoPE

Description Template Test Solution Tips