GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning Paper • 2505.17022 • Published May 22 • 27
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation Paper • 2503.16430 • Published Mar 20 • 34
LVD-2M: A Long-take Video Dataset with Temporally Dense Captions Paper • 2410.10816 • Published Oct 14, 2024 • 21
Twins: Revisiting the Design of Spatial Attention in Vision Transformers Paper • 2104.13840 • Published Apr 28, 2021
Loong: Generating Minute-level Long Videos with Autoregressive Language Models Paper • 2410.02757 • Published Oct 3, 2024 • 36
CenterMask: single shot instance segmentation with point representation Paper • 2004.04446 • Published Apr 9, 2020