Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward Paper • 2510.03222 • Published Oct 3 • 45
Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models Paper • 2502.15499 • Published Feb 21 • 15
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models Paper • 2411.03884 • Published Nov 6, 2024 • 28