BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping Paper • 2510.18927 • Published Oct 21 • 82
CoLLiE: Collaborative Training of Large Language Models in an Efficient Way Paper • 2312.00407 • Published Dec 1, 2023 • 3
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning Paper • 2402.06332 • Published Feb 9, 2024 • 20