DeepWideSearch: Benchmarking Depth and Width in Agentic Information Seeking Paper โข 2510.20168 โข Published Oct 23 โข 27
HSCodeComp: A Realistic and Expert-level Benchmark for Deep Search Agents in Hierarchical Rule Application Paper โข 2510.19631 โข Published Oct 22 โข 27
Running on CPU Upgrade 13.7k Open LLM Leaderboard ๐ 13.7k Track, rank and evaluate open LLMs and chatbots