SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention Paper • 2509.24006 • Published Sep 28 • 115
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT Paper • 2509.19284 • Published Sep 23 • 22
Accelerating Retrieval-Augmented Language Model Serving with Speculation Paper • 2401.14021 • Published Jan 25, 2024 • 2
Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning Paper • 2508.07101 • Published Aug 9 • 13
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Paper • 2507.16784 • Published Jul 22 • 121
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention Paper • 2410.05076 • Published Oct 7, 2024 • 8