DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance Paper • 2505.14708 • Published May 17
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation Paper • 2510.00515 • Published Oct 1 • 39
The Geometry of Reasoning: Flowing Logics in Representation Space Paper • 2510.09782 • Published Oct 10 • 6
Why Do Transformers Fail to Forecast Time Series In-Context? Paper • 2510.09776 • Published Oct 10 • 2
LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers Paper • 2412.12444 • Published Dec 17, 2024
Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs Paper • 2506.00577 • Published May 31 • 11
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time Paper • 2408.13233 • Published Aug 23, 2024 • 24