Provable Post Training Quantization and Its Error Compensation for Large Language Models

Combinatorics and Probability

Speaker:

Shihao Zhang

Institution:

UCSD

Time:

Monday, June 1, 2026 - 2:00pm to 3:00pm

Location:

340P Rowland Hall

Post-training quantization (PTQ) has become a crucial tool for reducing the memory and compute costs of modern deep neural networks, including large language models (LLMs). Among PTQ algorithms, GPTQ has emerged as a leading method due to its computational efficiency and strong empirical performance. Despite its widespread adoption, GPTQ lacks rigorous quantitative theoretical guarantees. In this talk, I will discuss the first quantitative error bounds for both deterministic and stochastic variants of GPTQ. We also proposed a novel LoRA adaptation method for PTQ, which can achieve near-optimal guarantees on layer-wise reconstruction error.