Concentration of Measure for Distributions Generated via Diffusion Models Paper • 2501.07741 • Published Jan 13
LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation Paper • 2501.05414 • Published Jan 9 • 2
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19 • 17
Automating Human Tutor-Style Programming Feedback: Leveraging GPT-4 Tutor Model for Hint Generation and GPT-3.5 Student Model for Hint Validation Paper • 2310.03780 • Published Oct 5, 2023
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published Sep 18, 2024 • 39
ProtoTEx: Explaining Model Decisions with Prototype Tensors Paper • 2204.05426 • Published Apr 11, 2022
The State of Human-centered NLP Technology for Fact-checking Paper • 2301.03056 • Published Jan 8, 2023