Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning Paper • 2506.04723 • Published Jun 5 • 1
LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild Paper • 2510.14240 • Published Oct 16 • 11
Synthesizing Agentic Data for Web Agents with Progressive Difficulty Enhancement Mechanisms Paper • 2510.13913 • Published Oct 15 • 3
Helpful Agent Meets Deceptive Judge: Understanding Vulnerabilities in Agentic Workflows Paper • 2506.03332 • Published Jun 3 • 2
Demystifying Domain-adaptive Post-training for Financial LLMs Paper • 2501.04961 • Published Jan 9 • 11