ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization Paper • 2509.13313 • Published Sep 16 • 77
Training Step-Level Reasoning Verifiers with Formal Verification Tools Paper • 2505.15960 • Published May 21 • 7
CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships? Paper • 2502.11300 • Published Feb 16
From Intentions to Techniques: A Comprehensive Taxonomy and Challenges in Text Watermarking for Large Language Models Paper • 2406.11106 • Published Jun 17, 2024
RONA: Pragmatically Diverse Image Captioning with Coherence Relations Paper • 2503.10997 • Published Mar 14 • 1
Concurrent Adversarial Learning for Large-Batch Training Paper • 2106.00221 • Published Jun 1, 2021 • 1
Rethinking Architecture Selection in Differentiable NAS Paper • 2108.04392 • Published Aug 10, 2021
Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning Paper • 2402.15751 • Published Feb 24, 2024
MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries? Paper • 2406.17806 • Published Jun 22, 2024 • 1
One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts Paper • 2407.00256 • Published Jun 28, 2024 • 1
Understanding the Impact of Negative Prompts: When and How Do They Take Effect? Paper • 2406.02965 • Published Jun 5, 2024
MuLan: Multimodal-LLM Agent for Progressive and Interactive Multi-Object Diffusion Paper • 2402.12741 • Published Feb 20, 2024
LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs -- No Silver Bullet for LC or RAG Routing Paper • 2502.09977 • Published Feb 14 • 1
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model Paper • 2503.05132 • Published Mar 7 • 57
GReaTer: Gradients over Reasoning Makes Smaller Language Models Strong Prompt Optimizers Paper • 2412.09722 • Published Dec 12, 2024 • 5
VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information Paper • 2412.00947 • Published Dec 1, 2024 • 8
AAAR-1.0: Assessing AI's Potential to Assist Research Paper • 2410.22394 • Published Oct 29, 2024 • 16
IMDb data from Two Generations, from 1979 to 2019; Part one, Dataset Introduction and Preliminary Analysis Paper • 2005.14147 • Published May 28, 2020
Evaluating LLMs at Detecting Errors in LLM Responses Paper • 2404.03602 • Published Apr 4, 2024 • 3