Self-Rewarding Vision-Language Model via Reasoning Decomposition Paper • 2508.19652 • Published Aug 27 • 84
V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13 • 174
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model Paper • 2504.07615 • Published Apr 10 • 35
Collaborative Instance Navigation: Leveraging Agent Self-Dialogue to Minimize User Input Paper • 2412.01250 • Published Dec 2, 2024 • 5
Molmo Collection Artifacts for open multimodal language models. • 5 items • Updated 2 days ago • 309