Rotary position embedding, relative position via rotation
Hard AttentionImplement RoPE — the position encoding used in LLaMA, GPT-NeoX, and most modern LLMs.
Split each vector into consecutive pairs. Rotate each pair by θ = pos / 10000^(2i/D):
This makes dot(q_rot[i], k_rot[j]) depend only on i - j (relative position).
Implement the function below. Use only basic PyTorch operations.
Use this code to debug before submitting.
Try solving it yourself first! Click below to reveal the solution.
For interactive practice with auto-grading, run TorchCode locally:pip install torch-judge then use check("rope")
Rotary position embedding, relative position via rotation