After integrations with TorchAO, Transformers, and VLLM, AutoRound-quantized models are now officially compatible with SGLang — bringing faster and more flexible deployment to your LLM workflows.
💡 We’ve also enhanced the RTN mode (--iters 0), cutting quantization costs significantly for low-resource users.
⭐ Star our repo and stay tuned for more exciting updates!