Speaker:
Shihao Zhang
Institution:
UCSD
Time:
Monday, June 1, 2026 - 2:00pm to 3:00pm
Location:
340P Rowland Hall
Post-training quantization (PTQ) has become a crucial tool for reducing the memory and compute costs of modern deep neural networks, including large language models (LLMs). Among PTQ algorithms, GPTQ has emerged as a leading method due to its computational efficiency and strong empirical performance. Despite its widespread adoption, GPTQ lacks rigorous quantitative theoretical guarantees. In this talk, I will discuss the first quantitative error bounds for both deterministic and stochastic variants of GPTQ. We also proposed a novel LoRA adaptation method for PTQ, which can achieve near-optimal guarantees on layer-wise reconstruction error.
