PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity Paper • 2510.23603 • Published 15 days ago • 21
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization Paper • 2510.08540 • Published Oct 9 • 108
Geometry-Editable and Appearance-Preserving Object Compositon Paper • 2505.20914 • Published May 27 • 6
EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World? Paper • 2506.05287 • Published Jun 5 • 15
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Paper • 2501.00599 • Published Dec 31, 2024 • 47