LLMs Can't Handle Peer Pressure: Crumbling under Multi-Agent Social Interactions Paper β’ 2508.18321 β’ Published Aug 24 β’ 2
Demystifying deep search: a holistic evaluation with hint-free multi-hop questions and factorised metrics Paper β’ 2510.05137 β’ Published Oct 1 β’ 4
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent Paper β’ 2508.05748 β’ Published Aug 7 β’ 138
view article Article π¦Έπ»#1: Open-endedness and AI Agents β A Path from Generative to Creative AI? Dec 25, 2024 β’ 16
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling Paper β’ 2504.13169 β’ Published Apr 17 β’ 39
Long Reasoning Collection Datasets with reasoning traces for math and code (Train + Eval) β’ 49 items β’ Updated Mar 21 β’ 1
Reasoning Datasets Collection Distilled synthetic Reasoning datasets β’ 7 items β’ Updated Feb 2 β’ 61
π§ Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community β’ 24 items β’ Updated May 19 β’ 174
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework Paper β’ 2411.06176 β’ Published Nov 9, 2024 β’ 45
Direct Preference Optimization Datasets Collection Datasets suitable for DPO based on having 'chosen', 'rejected', and 'prompt' columns. Created using librarian-bots/dataset-column-search-api β’ 5520 items β’ Updated Apr 6 β’ 7