OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation Paper • 2604.11804 • Published 5 days ago • 68
Rethinking the Diffusion Model from a Langevin Perspective Paper • 2604.10465 • Published 6 days ago • 12
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI Paper • 2510.05684 • Published Oct 7, 2025 • 146
Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis Paper • 2603.29620 • Published 17 days ago • 46
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models Paper • 2603.25716 • Published 22 days ago • 154
ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks Paper • 2603.27862 • Published 19 days ago • 30
On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers Paper • 2603.28762 • Published 18 days ago • 25
RealMaster: Lifting Rendered Scenes into Photorealistic Video Paper • 2603.23462 • Published 24 days ago • 33
Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration Paper • 2603.24800 • Published 23 days ago • 67
The Pulse of Motion: Measuring Physical Frame Rate from Visual Dynamics Paper • 2603.14375 • Published Mar 15 • 19
UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation Paper • 2603.23500 • Published 24 days ago • 35
FlowScene: Style-Consistent Indoor Scene Generation with Multimodal Graph Rectified Flow Paper • 2603.19598 • Published 28 days ago • 32