All Problems Description Template Solution

INT8 Quantization

Per-channel quantize, scale/zero-point

Hard Advanced

Problem Description

Implement a post-training quantized linear layer using INT8 weights.

Signature

class Int8Linear(nn.Module): def __init__(self, weight: Tensor, bias: Tensor = None): ... def forward(self, x: Tensor) -> Tensor: ...

Quantization (per-channel)

1. scale = weight.abs().max(dim=1) / 127

2. weight_int8 = round(weight / scale).clamp(-128, 127).to(int8)

3. Store as register_buffer (not trainable)

4. Forward: dequantize (int8.float() * scale) then matmul

Template

Implement the function below. Use only basic PyTorch operations.

# ✏️ YOUR IMPLEMENTATION HERE class Int8Linear(nn.Module): def __init__(self, weight, bias=None): super().__init__() pass # quantize weight, register buffers def forward(self, x): pass # dequantize and matmul

Test Your Implementation

Use this code to debug before submitting.

# 🧪 Debug w = torch.randn(8, 4) q = Int8Linear(w) x = torch.randn(2, 4) print('Output:', q(x).shape) print('dtype:', q.weight_int8.dtype) print('Max quant error:', (w - q.weight_int8.float() * q.scale).abs().max().item())

Reference Solution

Try solving it yourself first! Click below to reveal the solution.

# ✅ SOLUTION class Int8Linear(nn.Module): def __init__(self, weight, bias=None): super().__init__() scale = weight.abs().amax(dim=1, keepdim=True) / 127.0 self.register_buffer('weight_int8', torch.round(weight / (scale + 1e-10)).clamp(-128, 127).to(torch.int8)) self.register_buffer('scale', scale) self.bias = nn.Parameter(bias.clone()) if bias is not None else None def forward(self, x): w = self.weight_int8.float() * self.scale out = x @ w.T if self.bias is not None: out = out + self.bias return out

Tips

Run Locally

For interactive practice with auto-grading, run TorchCode locally:
pip install torch-judge then use check("int8_quantization")

Key Concepts

Per-channel quantize, scale/zero-point

INT8 Quantization

Description Template Test Solution Tips