Pre-Trained Policy Discriminators are General Reward Models Paper • 2507.05197 • Published Jul 7 • 39
BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset Paper • 2507.03483 • Published Jul 4 • 23
BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset Paper • 2507.03483 • Published Jul 4 • 23
BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset Paper • 2507.03483 • Published Jul 4 • 23 • 1
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments Paper • 2406.04151 • Published Jun 6, 2024 • 23
DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting Paper • 2503.00784 • Published Mar 2 • 13
DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting Paper • 2503.00784 • Published Mar 2 • 13
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning Paper • 2402.05808 • Published Feb 8, 2024
view article Article Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial By open-r1 • Jan 31 • 51