VIDEOP2R: Video Understanding from Perception to Reasoning Paper • 2511.11113 • Published 9 days ago • 104
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist Paper • 2511.08521 • Published 12 days ago • 37
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge Paper • 2507.21183 • Published Jul 27 • 14
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 302
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2411.14432 • Published Nov 21, 2024 • 25
Emerging Properties in Unified Multimodal Pretraining Paper • 2505.14683 • Published May 20 • 134
On Path to Multimodal Generalist: General-Level and General-Bench Paper • 2505.04620 • Published May 7 • 82