UnSAMv2: Self-Supervised Learning Enables Segment Anything at Any Granularity Paper • 2511.13714 • Published 11 days ago • 10
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling Paper • 2511.11793 • Published 14 days ago • 154
Depth Anything 3: Recovering the Visual Space from Any Views Paper • 2511.10647 • Published 15 days ago • 88
Zep: A Temporal Knowledge Graph Architecture for Agent Memory Paper • 2501.13956 • Published Jan 20 • 6
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models Paper • 2510.01623 • Published Oct 2 • 10
GigaBrain-0: A World Model-Powered Vision-Language-Action Model Paper • 2510.19430 • Published Oct 22 • 46
LightRAG: Simple and Fast Retrieval-Augmented Generation Paper • 2410.05779 • Published Oct 8, 2024 • 19
IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction Paper • 2510.22706 • Published Oct 26 • 39
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model Paper • 2510.12276 • Published Oct 14 • 144
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective Paper • 2509.18905 • Published Sep 23 • 29