When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought Paper • 2511.02779 • Published 21 days ago • 55
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence Paper • 2510.23538 • Published 29 days ago • 95
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence Paper • 2510.20579 • Published Oct 23 • 55
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization Paper • 2510.08540 • Published Oct 9 • 108
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning Paper • 2509.02544 • Published Sep 2 • 123
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers Paper • 2508.21148 • Published Aug 28 • 139
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows Paper • 2505.19897 • Published May 26 • 104
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning Paper • 2508.20096 • Published Aug 27 • 36
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25 • 207
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward Paper • 2508.03686 • Published Aug 5 • 36
Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models Paper • 2508.00819 • Published Aug 1 • 62
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Paper • 2507.10532 • Published Jul 14 • 88
CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards Paper • 2507.09104 • Published Jul 12 • 17
ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing Paper • 2506.19848 • Published Jun 24 • 26
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation Paper • 2506.07977 • Published Jun 9 • 41
MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence Paper • 2505.23764 • Published May 29 • 3