VeriCoT: Neuro-symbolic Chain-of-Thought Validation via Logical Consistency Checks Paper • 2511.04662 • Published about 1 month ago • 34
ResearchQA: Evaluating Scholarly Question Answering at Scale Across 75 Fields with Survey-Mined Questions and Rubrics Paper • 2509.00496 • Published Aug 30 • 3
Concept Lancet: Image Editing with Compositional Representation Transplant Paper • 2504.02828 • Published Apr 3 • 16