After integrations with TorchAO, Transformers, and VLLM, AutoRound-quantized models are now officially compatible with SGLang β bringing faster and more flexible deployment to your LLM workflows.
π‘ Weβve also enhanced the RTN mode (--iters 0), cutting quantization costs significantly for low-resource users.
β Star our repo and stay tuned for more exciting updates!
AutoRound keeps evolving its LLM quantization algorithm! π After enhancing W2A16 quantization, we now offer a fast algorithm to generate mixed bits/data-type schemes (~2mins for 8B models), great for MXFP4 and W2A16. Learn more: https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme
AutoRound keeps evolving its LLM quantization algorithm! π After enhancing W2A16 quantization, we now offer a fast algorithm to generate mixed bits/data-type schemes (~2mins for 8B models), great for MXFP4 and W2A16. Learn more: https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme