RoPE - TorchCode

Problem Description

Implement RoPE — the position encoding used in LLaMA, GPT-NeoX, and most modern LLMs.

Signature

def apply_rope(q: Tensor, k: Tensor) -> tuple[Tensor, Tensor]:
    # q, k: (B, S, D) where D is even
    # Returns rotated (q, k) with same shape

Key Idea

Split each vector into consecutive pairs. Rotate each pair by θ = pos / 10000^(2i/D):

[x_0, x_1] → [x_0*cosθ - x_1*sinθ, x_0*sinθ + x_1*cosθ]

This makes dot(q_rot[i], k_rot[j]) depend only on i - j (relative position).

Template

Implement the function below. Use only basic PyTorch operations.

# ✏️ YOUR IMPLEMENTATION HERE

def apply_rope(q, k):
    # 1. Compute position angles
    # 2. Split into even/odd pairs
    # 3. Apply rotation
    pass

Test Your Implementation

Use this code to debug before submitting.

# 🧪 Debug
q = torch.randn(1, 8, 16)
k = torch.randn(1, 8, 16)
qr, kr = apply_rope(q, k)
print('Shape preserved:', qr.shape == q.shape)
print('Norm preserved:', torch.allclose(q.norm(dim=-1), qr.norm(dim=-1), atol=1e-4))

Reference Solution

Try solving it yourself first! Click below to reveal the solution.

# ✅ SOLUTION

def apply_rope(q, k):
    B, S, D = q.shape
    pos = torch.arange(S, device=q.device).unsqueeze(1).float()
    dim = torch.arange(0, D, 2, device=q.device).float()
    freqs = 1.0 / (10000.0 ** (dim / D))
    angles = pos * freqs
    cos_a = torch.cos(angles)
    sin_a = torch.sin(angles)

    def rotate(x):
        x1, x2 = x[..., 0::2], x[..., 1::2]
        return torch.stack([x1 * cos_a - x2 * sin_a,
                            x1 * sin_a + x2 * cos_a], dim=-1).flatten(-2)

    return rotate(q), rotate(k)

Tips

Run Locally

For interactive practice with auto-grading, run TorchCode locally:
pip install torch-judge then use check("rope")

Key Concepts

Rotary position embedding, relative position via rotation