AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios Paper • 2505.16944 • Published May 22, 2025 • 8
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research Paper • 2505.19253 • Published May 25, 2025 • 32
The Era of Agentic Organization: Learning to Organize with Language Models Paper • 2510.26658 • Published Oct 30, 2025 • 27