Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7 • 150
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling Paper • 2502.09509 • Published Feb 13 • 8
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models Paper • 2410.10733 • Published Oct 14, 2024 • 8
DC-AE 1.5: Accelerating Diffusion Model Convergence with Structured Latent Space Paper • 2508.00413 • Published Aug 1 • 5
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers Paper • 2504.10483 • Published Apr 14 • 20
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model Paper • 2508.14444 • Published Aug 20 • 36
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning Paper • 2508.18756 • Published Aug 26 • 36
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis Paper • 2509.10441 • Published Sep 12 • 30
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning Paper • 2509.08519 • Published Sep 10 • 126
Step1X-Edit: A Practical Framework for General Image Editing Paper • 2504.17761 • Published Apr 24 • 92
Transition Matching: Scalable and Flexible Generative Modeling Paper • 2506.23589 • Published Jun 30 • 1
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers Paper • 2506.23918 • Published Jun 30 • 88
Diffusion Beats Autoregressive in Data-Constrained Settings Paper • 2507.15857 • Published Jul 21 • 1
UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward Paper • 2509.06818 • Published Sep 8 • 29
Wan-Animate: Unified Character Animation and Replacement with Holistic Replication Paper • 2509.14055 • Published Sep 17 • 14
Inpainting-Guided Policy Optimization for Diffusion Large Language Models Paper • 2509.10396 • Published Sep 12 • 15
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation Paper • 2509.19296 • Published Sep 23 • 22
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT Paper • 2509.19284 • Published Sep 23 • 22
Seedream 4.0: Toward Next-generation Multimodal Image Generation Paper • 2509.20427 • Published Sep 24 • 75
OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing Paper • 2509.24900 • Published 29 days ago • 53
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published 15 days ago • 159
WithAnyone: Towards Controllable and ID Consistent Image Generation Paper • 2510.14975 • Published 12 days ago • 79
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3, 2024 • 74
DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion Paper • 2510.20766 • Published 5 days ago • 29