TMLR-Group-HF/Co-rewarding-RephrasedMATH
Viewer
•
Updated
•
7.5k
•
72
Co-rewarding is a novel self-supervised RL framework that improves training stability by seeking complementary supervision from another views.