Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem Paper • 2506.03295 • Published Jun 3 • 17
VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation Paper • 2506.03930 • Published Jun 4 • 26
HardTests: Synthesizing High-Quality Test Cases for LLM Coding Paper • 2505.24098 • Published May 30 • 43
Faith and Fate: Limits of Transformers on Compositionality Paper • 2305.18654 • Published May 29, 2023 • 7
Phantom: Subject-consistent video generation via cross-modal alignment Paper • 2502.11079 • Published Feb 16 • 59
NovelQA: A Benchmark for Long-Range Novel Question Answering Paper • 2403.12766 • Published Mar 18, 2024 • 1