ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs Paper • 2506.10128 • Published Jun 11 • 22
This&That: Language-Gesture Controlled Video Generation for Robot Planning Paper • 2407.05530 • Published Jul 8, 2024 • 4
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement Paper • 2504.07934 • Published Apr 10 • 20
Self-Supervised Video Forensics by Audio-Visual Anomaly Detection Paper • 2301.01767 • Published Jan 4, 2023
Knowledge Solver: Teaching LLMs to Search for Domain Knowledge from Knowledge Graphs Paper • 2309.03118 • Published Sep 6, 2023 • 2
Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning Paper • 2402.11690 • Published Feb 18, 2024 • 10
Binding Touch to Everything: Learning Unified Multimodal Tactile Representations Paper • 2401.18084 • Published Jan 31, 2024