LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods Paper • 2412.05579 • Published Dec 7, 2024 • 2
Enhancing LLM-Based Agents via Global Planning and Hierarchical Execution Paper • 2504.16563 • Published Apr 23 • 1
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1 • 238
Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis Paper • 2506.04142 • Published Jun 4 • 27
MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos Paper • 2506.04141 • Published Jun 4 • 29