L$^2$M: Mutual Information Scaling Law for Long-Context Language Modeling Paper • 2503.04725 • Published Mar 6 • 21 • 2