atrost/math_sft_40K_trl_think_SFT_Regularized-0.5_Normalize-True Text Generation • 2B • Updated Sep 25
atrost/math_sft_40K_trl_think_SFT_Regularized-0.5_Normalize-False Text Generation • 2B • Updated Sep 25
atrost/math_sft_40K_trl_think_SFT_Regularized-0.3_Normalize-True Text Generation • 2B • Updated Sep 25
atrost/math_sft_40K_trl_think_SFT_Regularized-0.3_Normalize-False Text Generation • 2B • Updated Sep 25
atrost/math_sft_40K_trl_think_SFT_Regularized-0.1_Normalize-True Text Generation • 2B • Updated Sep 25
atrost/math_sft_40K_trl_think_SFT_Regularized-0.1_Normalize-False Text Generation • 2B • Updated Sep 25
atrost/math_sft_40K_trl_think_SFT_Regularized-0.7_Normalize-True Text Generation • 2B • Updated Sep 25 • 2
atrost/math_sft_40K_trl_think_SFT_Regularized-0.7_Normalize-False Text Generation • 2B • Updated Sep 25
hdong0/Qwen3-1.7B-base-Open-R1-GRPO_deepscaler_acc_8192_nokl Text Generation • 2B • Updated Oct 7 • 9