Accept/reject, draft model acceleration
Hard InferenceImplement the acceptance/rejection step of speculative decoding โ a technique for accelerating LLM inference.
For each position i = 0, ..., K-1:
1. ratio = target_probs[i, token_i] / draft_probs[i, token_i]
2. Accept with probability min(1, ratio)
3. If rejected: sample from normalize(max(0, target - draft)), append, and stop
Implement the function below. Use only basic PyTorch operations.
Use this code to debug before submitting.
Try solving it yourself first! Click below to reveal the solution.
For interactive practice with auto-grading, run TorchCode locally:pip install torch-judge then use check("speculative_decoding")
Accept/reject, draft model acceleration