Yume-1.5: A Text-Controlled Interactive World Generation Model Paper • 2512.22096 • Published 4 days ago • 49
LLaDA2.0: Scaling Up Diffusion Language Models to 100B Paper • 2512.15745 • Published 21 days ago • 77
DeContext as Defense: Safe Image Editing in Diffusion Transformers Paper • 2512.16625 • Published 12 days ago • 24
IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning Paper • 2512.15635 • Published 13 days ago • 19
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length Paper • 2512.04677 • Published 26 days ago • 167
WAON Collection WAON: Large-Scale and High-Quality Japanese Image-Text Pair Dataset for Vision-Language Models • 4 items • Updated Oct 28 • 1
WAON: Large-Scale and High-Quality Japanese Image-Text Pair Dataset for Vision-Language Models Paper • 2510.22276 • Published Oct 25 • 3
UltraGen: High-Resolution Video Generation with Hierarchical Attention Paper • 2510.18775 • Published Oct 21 • 17
Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset Paper • 2510.15742 • Published Oct 17 • 50
RAE Collection Collection for Diffusion Transformers with Representation Autoencoders • 1 item • Updated Oct 14 • 10
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation Paper • 2510.02283 • Published Oct 2 • 95
SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer Paper • 2509.24695 • Published Sep 29 • 44
SpatialVID: A Large-Scale Video Dataset with Spatial Annotations Paper • 2509.09676 • Published Sep 11 • 33