PRISMM-Bench: A Benchmark of Peer-Review Grounded Multimodal Inconsistencies Paper • 2510.16505 • Published Oct 18 • 3
PRISMM-Bench: A Benchmark of Peer-Review Grounded Multimodal Inconsistencies Paper • 2510.16505 • Published Oct 18 • 3 • 2
TTRV: Test-Time Reinforcement Learning for Vision Language Models Paper • 2510.06783 • Published Oct 8 • 11
VisualOverload: Probing Visual Understanding of VLMs in Really Dense Scenes Paper • 2509.25339 • Published Sep 29 • 9
GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models Paper • 2410.06154 • Published Oct 8, 2024 • 16
LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content Paper • 2410.10783 • Published Oct 14, 2024 • 27
LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content Paper • 2410.10783 • Published Oct 14, 2024 • 27
LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content Paper • 2410.10783 • Published Oct 14, 2024 • 27 • 2
GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models Paper • 2410.06154 • Published Oct 8, 2024 • 16
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation Paper • 2410.07170 • Published Oct 9, 2024 • 16