Abstract
Adversarial flow models unify adversarial and flow-based generative models, offering stable training, efficient generation, and high performance on image datasets.
We present adversarial flow models, a class of generative models that unifies adversarial models and flow models. Our method supports native one-step or multi-step generation and is trained using the adversarial objective. Unlike traditional GANs, where the generator learns an arbitrary transport plan between the noise and the data distributions, our generator learns a deterministic noise-to-data mapping, which is the same optimal transport as in flow-matching models. This significantly stabilizes adversarial training. Also, unlike consistency-based methods, our model directly learns one-step or few-step generation without needing to learn the intermediate timesteps of the probability flow for propagation. This saves model capacity, reduces training iterations, and avoids error accumulation. Under the same 1NFE setting on ImageNet-256px, our B/2 model approaches the performance of consistency-based XL/2 models, while our XL/2 model creates a new best FID of 2.38. We additionally show the possibility of end-to-end training of 56-layer and 112-layer models through depth repetition without any intermediate supervision, and achieve FIDs of 2.08 and 1.94 using a single forward pass, surpassing their 2NFE and 4NFE counterparts.
Community
Adversarial Flow Models (AF) unify Adversarial Models and Flow Models. It natively supports single-step or multi-step training and generation.
Unlike GANs, which learn arbitrary transport plans, AF learns a deterministic Wasserstein-2 transport plan, the same as flow matching. This allows stable training on a standard transformer architecture.
Unlike Consistency Models, AF does not need to be trained on all timesteps for the consistency constraint. This saves model capacity and avoids error propagation. On ImageNet 256px, AF-B/2 can approach the performance of Consistency-XL/2, while AF-XL/2 1NFE achieves a new best FID of 2.38!
Unlike Flow Matching, whose MSE objective minimizes the Euclidean distance and causes OOD generation without guidance, AF minimizes a learned discriminator distance, which better represents the semantic distance on the data manifold. AF can significantly surpass Flow Matching in the no-guidance setting.
We also demonstrate end-to-end training of 56-layer and 112-layer 1NFE networks. No intermediate supervision. No teacher forcing. No manual timestep discretization. They surpass the 2NFE and 4NFE 28-layer counterparts, achieving a best FID of 1.94!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- SONA: Learning Conditional, Unconditional, and Mismatching-Aware Discriminator (2025)
- Generative AI in depth: A survey of recent advances, model variants, and real-world applications (2025)
- SSDD: Single-Step Diffusion Decoder for Efficient Image Tokenization (2025)
- GAS: Improving Discretization of Diffusion ODEs via Generalized Adversarial Solver (2025)
- TReFT: Taming Rectified Flow Models For One-Step Image Translation (2025)
- Joint Discriminative-Generative Modeling via Dual Adversarial Training (2025)
- A Non-Adversarial Approach to Idempotent Generative Modelling (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper