InteractComp: Evaluating Search Agents With Ambiguous Queries Paper • 2510.24668 • Published 4 days ago • 94
QueST: Incentivizing LLMs to Generate Difficult Problems Paper • 2510.17715 • Published 12 days ago • 31
R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth? Paper • 2510.08189 • Published 23 days ago • 25
UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG Paper • 2510.03663 • Published 28 days ago • 15
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search Paper • 2509.25454 • Published Sep 29 • 136
The Era of Real-World Human Interaction: RL from User Conversations Paper • 2509.25137 • Published Sep 29 • 18
Seedream 4.0: Toward Next-generation Multimodal Image Generation Paper • 2509.20427 • Published Sep 24 • 76
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent Paper • 2508.05748 • Published Aug 7 • 137
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent Paper • 2508.05748 • Published Aug 7 • 137 • 4
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent Paper • 2508.05748 • Published Aug 7 • 137
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving Paper • 2507.06229 • Published Jul 8 • 75
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving Paper • 2507.06229 • Published Jul 8 • 75
MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning Paper • 2506.00555 • Published May 31 • 1
PhysUniBench: An Undergraduate-Level Physics Reasoning Benchmark for Multimodal Models Paper • 2506.17667 • Published Jun 21 • 3