Code2Video: A Code-centric Paradigm for Educational Video Generation Paper • 2510.01174 • Published 26 days ago • 33
Seedream 4.0: Toward Next-generation Multimodal Image Generation Paper • 2509.20427 • Published Sep 24 • 75
MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement Paper • 2509.01977 • Published Sep 2 • 12
AnyI2V: Animating Any Conditional Image with Motion Control Paper • 2507.02857 • Published Jul 3 • 12
OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion Paper • 2507.06165 • Published Jul 8 • 58
XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation Paper • 2506.21416 • Published Jun 26 • 28
Discrete Diffusion in Large Language and Multimodal Models: A Survey Paper • 2506.13759 • Published Jun 16 • 43
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning Paper • 2506.09985 • Published Jun 11 • 29
PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework Paper • 2506.10741 • Published Jun 12 • 27
MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks Paper • 2506.05982 • Published Jun 6 • 2
MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks Paper • 2506.05982 • Published Jun 6 • 2 • 2
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation Paper • 2506.09790 • Published Jun 11 • 53