Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents Paper • 2510.14438 • Published Oct 16 • 13
Benchmarking Commonsense Knowledge Base Population with an Effective Evaluation Dataset Paper • 2109.07679 • Published Sep 16, 2021
AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph Paper • 2311.09174 • Published Nov 15, 2023
AbsInstruct: Eliciting Abstraction Ability from LLMs through Explanation Tuning with Plausibility Estimation Paper • 2402.10646 • Published Feb 16, 2024
CKBP v2: Better Annotation and Reasoning for Commonsense Knowledge Base Population Paper • 2304.10392 • Published Apr 20, 2023
UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression Paper • 2509.15763 • Published Sep 19
NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents Paper • 2510.07172 • Published Oct 8 • 28
KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination Detection Paper • 2310.09044 • Published Oct 13, 2023
StoryAnalogy: Deriving Story-level Analogies from Large Language Models to Unlock Analogical Understanding Paper • 2310.12874 • Published Oct 19, 2023
CAR: Conceptualization-Augmented Reasoner for Zero-Shot Commonsense Question Answering Paper • 2305.14869 • Published May 24, 2023 • 1
CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning Paper • 2401.07286 • Published Jan 14, 2024
Neuro-Inspired Information-Theoretic Hierarchical Perception for Multimodal Learning Paper • 2404.09403 • Published Apr 15, 2024
IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce Paper • 2406.10173 • Published Jun 14, 2024
LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks Paper • 2410.01744 • Published Oct 2, 2024 • 26
DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects Paper • 2410.02730 • Published Oct 3, 2024
OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization Paper • 2410.19609 • Published Oct 25, 2024 • 18
Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language Models Paper • 2412.16545 • Published Dec 21, 2024
ChatGPT Evaluation on Sentence Level Relations: A Focus on Temporal, Causal, and Discourse Relations Paper • 2304.14827 • Published Apr 28, 2023
WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model Paper • 2504.21024 • Published Apr 23 • 2
VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models Paper • 2505.22654 • Published May 28 • 1