Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models Paper • 2503.16257 • Published Mar 20, 2025 • 28
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache Paper • 2402.02750 • Published Feb 5, 2024 • 5
Token Warping Helps MLLMs Look from Nearby Viewpoints Paper • 2604.02870 • Published 10 days ago • 33
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters Paper • 2406.05955 • Published Jun 10, 2024 • 28
Nemotron Speech Collection Open, state-of-the-art, production‑ready enterprise speech models from the NVIDIA Speech research team for ASR, TTS, Speaker Diarization and S2S • 11 items • Updated 6 days ago • 46
view article Article Tokenization in Transformers v5: Simpler, Clearer, and More Modular +4 Dec 18, 2025 • 124
Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory. • 29 items • Updated Aug 14, 2025 • 32
VideoChat-R1 Collection VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning • 4 items • Updated Sep 28, 2025 • 9
Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated Mar 12 • 218
DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models Paper • 2504.02882 • Published Apr 2, 2025 • 7
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published Apr 7, 2025 • 207