-
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Paper • 2505.20686 • Published • 2 -
Cornell-AGI/gsm8k_size_qwen2.5_1.5b_eval
Viewer • Updated • 7.47k • 5 -
Cornell-AGI/gsm8k_size_qwen2.5_3b_eval
Viewer • Updated • 7.47k • 6 -
Cornell-AGI/gsm8k_size_qwen2.5_7b_eval
Viewer • Updated • 7.47k • 12
Cornell-AGI
university
AI & ML interests
Reinforcement Learning from Human Feedback
Recent Activity
View all activity
-
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Paper • 2410.04612 • Published -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_1
8B • Updated • 1 -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_2
8B • Updated • 1 -
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_1
Viewer • Updated • 64.6k • 19 • 2
-
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Paper • 2505.20686 • Published • 2 -
Cornell-AGI/gsm8k_size_qwen2.5_1.5b_eval
Viewer • Updated • 7.47k • 5 -
Cornell-AGI/gsm8k_size_qwen2.5_3b_eval
Viewer • Updated • 7.47k • 6 -
Cornell-AGI/gsm8k_size_qwen2.5_7b_eval
Viewer • Updated • 7.47k • 12
-
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Paper • 2410.04612 • Published -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_1
8B • Updated • 1 -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_2
8B • Updated • 1 -
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_1
Viewer • Updated • 64.6k • 19 • 2
models
20
Cornell-AGI/apo_math_qwen2.5_1.5b
Text Generation
•
2B
•
Updated
•
3
Cornell-AGI/ppo_math_qwen2.5_1.5b
Text Generation
•
2B
•
Updated
•
26
Cornell-AGI/rebel_math_qwen2.5_1.5b
Text Generation
•
2B
•
Updated
Cornell-AGI/grpo_math_qwen2.5_3b
Text Generation
•
3B
•
Updated
Cornell-AGI/grpo_math_qwen2.5_1.5b
Text Generation
•
2B
•
Updated
•
5
Cornell-AGI/ppo_math_qwen2.5_3b
Text Generation
•
3B
•
Updated
•
8
Cornell-AGI/rebel_math_qwen2.5_3b
Text Generation
•
3B
•
Updated
•
2
Cornell-AGI/apo_math_qwen2.5_3b
Text Generation
•
3B
•
Updated
Cornell-AGI/grpo_math_qwen2.5_7b
Text Generation
•
8B
•
Updated
•
4
Cornell-AGI/ppo_math_qwen2.5_7b
Text Generation
•
8B
•
Updated
•
15
datasets
15
Cornell-AGI/math_size_qwen2.5_7b_eval
Viewer
•
Updated
•
7.5k
•
20
Cornell-AGI/math_size_qwen2.5_3b_eval
Viewer
•
Updated
•
7.5k
•
8
Cornell-AGI/math_size_qwen2.5_1.5b_eval
Viewer
•
Updated
•
7.5k
•
12
Cornell-AGI/gsm8k_size_qwen2.5_7b_eval
Viewer
•
Updated
•
7.47k
•
12
Cornell-AGI/gsm8k_size_qwen2.5_3b_eval
Viewer
•
Updated
•
7.47k
•
6
Cornell-AGI/gsm8k_size_qwen2.5_1.5b_eval
Viewer
•
Updated
•
7.47k
•
5
Cornell-AGI/amazon_movie_tv_item_mxbai
Viewer
•
Updated
•
10.5k
•
14
Cornell-AGI/amazon_movie_tv_llama_mxbai
Viewer
•
Updated
•
17.1k
•
70
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_2
Viewer
•
Updated
•
116k
•
25
•
1
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_1
Viewer
•
Updated
•
64.6k
•
19
•
2