Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published 29 days ago • 172
VidText: Towards Comprehensive Evaluation for Video Text Understanding Paper • 2505.22810 • Published May 28 • 19