EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning Paper • 2510.17928 • Published 12 days ago • 2
DevBench: A Comprehensive Benchmark for Software Development Paper • 2403.08604 • Published Mar 13, 2024 • 2
SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution Paper • 2501.05040 • Published Jan 9 • 15
Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation Paper • 2502.06563 • Published Feb 10
Confidence as a Reward: Transforming LLMs into Reward Models Paper • 2510.13501 • Published 17 days ago • 1