TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them Paper • 2509.21117 • Published Sep 25 • 29
Temporal Self-Rewarding Language Models: Decoupling Chosen-Rejected via Past-Future Paper • 2508.06026 • Published Aug 8 • 15
RewardAnything: Generalizable Principle-Following Reward Models Paper • 2506.03637 • Published Jun 4 • 1
Outcome-Refining Process Supervision for Code Generation Paper • 2412.15118 • Published Dec 19, 2024 • 19
KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models Paper • 2402.15043 • Published Feb 23, 2024 • 1