MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use Paper • 2509.24002 • Published Sep 28 • 170
Running 3.42k 3.42k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs Paper • 2507.11097 • Published Jul 15 • 64
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC Paper • 2502.14282 • Published Feb 20 • 29
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning Paper • 2502.14768 • Published Feb 20 • 47
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 422
Agent Laboratory: Using LLM Agents as Research Assistants Paper • 2501.04227 • Published Jan 8 • 94
Cosmos World Foundation Model Platform for Physical AI Paper • 2501.03575 • Published Jan 7 • 81