LaSeR: Reinforcement Learning with Last-Token Self-Rewarding Paper • 2510.14943 • Published 16 days ago • 37
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective Paper • 2506.17930 • Published Jun 22 • 19
ReDit: Reward Dithering for Improved LLM Policy Optimization Paper • 2506.18631 • Published Jun 23 • 7