BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences Paper • 2403.09347 • Published Mar 14, 2024 • 22