KTO: Model Alignment as Prospect Theoretic Optimization Paper • 2402.01306 • Published Feb 2, 2024 • 20
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published Jun 16 • 271
SLiC-HF: Sequence Likelihood Calibration with Human Feedback Paper • 2305.10425 • Published May 17, 2023 • 6
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning Paper • 2508.08221 • Published Aug 11 • 48
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7 • 178
panda-gym: Open-source goal-conditioned environments for robotic learning Paper • 2106.13687 • Published Jun 25, 2021 • 3
Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning Paper • 2402.03046 • Published Feb 5, 2024 • 7
Distributional Preference Alignment of LLMs via Optimal Transport Paper • 2406.05882 • Published Jun 9, 2024 • 2