Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss Paper • 2512.23447 • Published 2 days ago • 79 • 3
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper • 2412.19723 • Published Dec 27, 2024 • 87 • 4
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published Jan 1 • 109 • 8
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published 8 days ago • 48 • 4
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models Paper • 2512.19995 • Published 8 days ago • 14 • 5
Spatia: Video Generation with Updatable Spatial Memory Paper • 2512.15716 • Published 14 days ago • 28 • 4
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning Paper • 2512.20605 • Published 8 days ago • 57 • 5
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows Paper • 2512.16969 • Published 13 days ago • 107 • 9
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling Paper • 2511.20785 • Published Nov 25 • 181 • 7
Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published 13 days ago • 81 • 4
WorldGen: From Text to Traversable and Interactive 3D Worlds Paper • 2511.16825 • Published Nov 20 • 23 • 4
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published 22 days ago • 115 • 3
Generalist Foundation Models Are Not Clinical Enough for Hospital Operations Paper • 2511.13703 • Published Nov 17 • 21 • 3