Spectral Scaling Laws in Language Models: How Effectively Do Feed-Forward Networks Use Their Latent Space? Paper • 2510.00537 • Published 28 days ago • 1 • 2
AERO: Softmax-Only LLMs for Efficient Private Inference Paper • 2410.13060 • Published Oct 16, 2024 • 4 • 2
ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models Paper • 2410.09637 • Published Oct 12, 2024 • 4 • 2