My favourites - a Warvito Collection

Warvito 's Collections

My favourites

updated 3 days ago

Test-Time Scaling with Reflective Generative Model

Paper • 2507.01951 • Published Jul 2 • 106
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7 • 150
Autoregressive Diffusion Models

Paper • 2110.02037 • Published Oct 5, 2021
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Paper • 2502.09509 • Published Feb 13 • 8
Improving the Diffusability of Autoencoders

Paper • 2502.14831 • Published Feb 20 • 2
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models

Paper • 2410.10733 • Published Oct 14, 2024 • 8
DC-AE 1.5: Accelerating Diffusion Model Convergence with Structured Latent Space

Paper • 2508.00413 • Published Aug 1 • 5
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

Paper • 2504.10483 • Published Apr 14 • 20
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

Paper • 2508.14444 • Published Aug 20 • 36
MetaCLIP 2: A Worldwide Scaling Recipe

Paper • 2507.22062 • Published Jul 29 • 28
Waver: Wave Your Way to Lifelike Video Generation

Paper • 2508.15761 • Published Aug 21 • 33
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4 • 258
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

Paper • 2508.18756 • Published Aug 26 • 36
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis

Paper • 2509.10441 • Published Sep 12 • 30
Why Language Models Hallucinate

Paper • 2509.04664 • Published Sep 4 • 189
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

Paper • 2509.08519 • Published Sep 10 • 126
Step1X-Edit: A Practical Framework for General Image Editing

Paper • 2504.17761 • Published Apr 24 • 92
Transition Matching: Scalable and Flexible Generative Modeling

Paper • 2506.23589 • Published Jun 30 • 1
MMaDA: Multimodal Large Diffusion Language Models

Paper • 2505.15809 • Published May 21 • 96
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

Paper • 2506.23918 • Published Jun 30 • 88
Diffusion Beats Autoregressive in Data-Constrained Settings

Paper • 2507.15857 • Published Jul 21 • 1
Hierarchical Reasoning Model

Paper • 2506.21734 • Published Jun 26 • 43
UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward

Paper • 2509.06818 • Published Sep 8 • 29
Wan-Animate: Unified Character Animation and Replacement with Holistic Replication

Paper • 2509.14055 • Published Sep 17 • 14
Inpainting-Guided Policy Optimization for Diffusion Large Language Models

Paper • 2509.10396 • Published Sep 12 • 15
Lynx: Towards High-Fidelity Personalized Video Generation

Paper • 2509.15496 • Published Sep 19 • 12
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation

Paper • 2509.19296 • Published Sep 23 • 22
Video models are zero-shot learners and reasoners

Paper • 2509.20328 • Published Sep 24 • 95
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT

Paper • 2509.19284 • Published Sep 23 • 22
Seedream 4.0: Toward Next-generation Multimodal Image Generation

Paper • 2509.20427 • Published Sep 24 • 75
Stochastic activations

Paper • 2509.22358 • Published Sep 26 • 2
OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing

Paper • 2509.24900 • Published 29 days ago • 53
Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published 15 days ago • 159
WithAnyone: Towards Controllable and ID Consistent Image Generation

Paper • 2510.14975 • Published 12 days ago • 79
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3, 2024 • 74
DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion

Paper • 2510.20766 • Published 5 days ago • 29