20 6 22

wenhua cheng

wenhuach

wenhuach21

AI & ML interests

Model Compression, CV

Recent Activity

updated a model about 4 hours ago

Intel/Ling-flash-2.0-gguf-q2ks-mixed-AutoRound

posted an update 3 days ago

🚀 AutoRound(https://github.com/intel/auto-round) is now supported by SGLang! After integrations with TorchAO, Transformers, and VLLM, AutoRound-quantized models are now officially compatible with SGLang — bringing faster and more flexible deployment to your LLM workflows. 💡 We’ve also enhanced the RTN mode (--iters 0), cutting quantization costs significantly for low-resource users. ⭐ Star our repo and stay tuned for more exciting updates!

reacted to their post with 🚀 3 days ago

AutoRound keeps evolving its LLM quantization algorithm! 🚀 After enhancing W2A16 quantization, we now offer a fast algorithm to generate mixed bits/data-type schemes (~2mins for 8B models), great for MXFP4 and W2A16. Learn more: https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme

View all activity

Organizations

updated a model about 4 hours ago

Intel/Ling-flash-2.0-gguf-q2ks-mixed-AutoRound

103B • Updated about 4 hours ago • 287 • 5

posted an update 3 days ago

Post

159

🚀 AutoRound(https://github.com/intel/auto-round) is now supported by SGLang!

After integrations with TorchAO, Transformers, and VLLM, AutoRound-quantized models are now officially compatible with SGLang — bringing faster and more flexible deployment to your LLM workflows.

💡 We’ve also enhanced the RTN mode (--iters 0), cutting quantization costs significantly for low-resource users.

⭐ Star our repo and stay tuned for more exciting updates!

reacted to their post with 🚀 3 days ago

Post

1696

AutoRound keeps evolving its LLM quantization algorithm! 🚀
After enhancing W2A16 quantization, we now offer a fast algorithm to generate mixed bits/data-type schemes (~2mins for 8B models), great for MXFP4 and W2A16.
Learn more: https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme

New activity in Intel/Ling-flash-2.0-gguf-q2ks-mixed-AutoRound 3 days ago

Inference with llama.cpp + Open WebUI gives repeating `?`

#1 opened 6 days ago by

whoisjeremylam

updated a model 10 days ago

Intel/Qwen3-8B-GGUF-Q2KS-AS-AutoRound

8B • Updated 10 days ago • 214 • 3

published a model 10 days ago

Intel/Qwen3-8B-GGUF-Q2KS-AS-AutoRound

8B • Updated 10 days ago • 214 • 3

posted an update 14 days ago

Post

1696

AutoRound keeps evolving its LLM quantization algorithm! 🚀
After enhancing W2A16 quantization, we now offer a fast algorithm to generate mixed bits/data-type schemes (~2mins for 8B models), great for MXFP4 and W2A16.
Learn more: https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme