INT8 Quantization

Per-channel quantize, scale/zero-point

Hard Advanced

Problem Description

Implement a post-training quantized linear layer using INT8 weights.

Signature

class Int8Linear(nn.Module):
    def __init__(self, weight: Tensor, bias: Tensor = None): ...
    def forward(self, x: Tensor) -> Tensor: ...

Quantization (per-channel)

1. scale = weight.abs().max(dim=1) / 127

2. weight_int8 = round(weight / scale).clamp(-128, 127).to(int8)

3. Store as register_buffer (not trainable)

4. Forward: dequantize (int8.float() * scale) then matmul

Template

Implement the function below. Use only basic PyTorch operations.

# ✏️ YOUR IMPLEMENTATION HERE

class Int8Linear(nn.Module):
    def __init__(self, weight, bias=None):
        super().__init__()
        pass  # quantize weight, register buffers

    def forward(self, x):
        pass  # dequantize and matmul

Test Your Implementation

Use this code to debug before submitting.

# 🧪 Debug
w = torch.randn(8, 4)
q = Int8Linear(w)
x = torch.randn(2, 4)
print('Output:', q(x).shape)
print('dtype:', q.weight_int8.dtype)
print('Max quant error:', (w - q.weight_int8.float() * q.scale).abs().max().item())

Reference Solution

Try solving it yourself first! Click below to reveal the solution.

# ✅ SOLUTION

class Int8Linear(nn.Module):
    def __init__(self, weight, bias=None):
        super().__init__()
        scale = weight.abs().amax(dim=1, keepdim=True) / 127.0
        self.register_buffer('weight_int8',
            torch.round(weight / (scale + 1e-10)).clamp(-128, 127).to(torch.int8))
        self.register_buffer('scale', scale)
        self.bias = nn.Parameter(bias.clone()) if bias is not None else None

    def forward(self, x):
        w = self.weight_int8.float() * self.scale
        out = x @ w.T
        if self.bias is not None:
            out = out + self.bias
        return out

Tips

Run Locally

For interactive practice with auto-grading, run TorchCode locally:
pip install torch-judge then use check("int8_quantization")

Key Concepts

Per-channel quantize, scale/zero-point

BPE Tokenizer DPO Loss

INT8 Quantization

Description Template Test Solution Tips