arxiv:2511.22475

Adversarial Flow Models

Published on Nov 27

· Submitted by

Peter Lin on Dec 1

ByteDance Seed

Upvote

Authors:

Shanchuan Lin ,

Abstract

Adversarial flow models unify adversarial and flow-based generative models, offering stable training, efficient generation, and high performance on image datasets.

AI-generated summary

We present adversarial flow models, a class of generative models that unifies adversarial models and flow models. Our method supports native one-step or multi-step generation and is trained using the adversarial objective. Unlike traditional GANs, where the generator learns an arbitrary transport plan between the noise and the data distributions, our generator learns a deterministic noise-to-data mapping, which is the same optimal transport as in flow-matching models. This significantly stabilizes adversarial training. Also, unlike consistency-based methods, our model directly learns one-step or few-step generation without needing to learn the intermediate timesteps of the probability flow for propagation. This saves model capacity, reduces training iterations, and avoids error accumulation. Under the same 1NFE setting on ImageNet-256px, our B/2 model approaches the performance of consistency-based XL/2 models, while our XL/2 model creates a new best FID of 2.38. We additionally show the possibility of end-to-end training of 56-layer and 112-layer models through depth repetition without any intermediate supervision, and achieve FIDs of 2.08 and 1.94 using a single forward pass, surpassing their 2NFE and 4NFE counterparts.

View arXiv page View PDF GitHub 3 Add to collection

Community

PeterL1n

Paper author Paper submitter 3 days ago

Adversarial Flow Models (AF) unify Adversarial Models and Flow Models. It natively supports single-step or multi-step training and generation.

Unlike GANs, which learn arbitrary transport plans, AF learns a deterministic Wasserstein-2 transport plan, the same as flow matching. This allows stable training on a standard transformer architecture.

Unlike Consistency Models, AF does not need to be trained on all timesteps for the consistency constraint. This saves model capacity and avoids error propagation. On ImageNet 256px, AF-B/2 can approach the performance of Consistency-XL/2, while AF-XL/2 1NFE achieves a new best FID of 2.38!

Unlike Flow Matching, whose MSE objective minimizes the Euclidean distance and causes OOD generation without guidance, AF minimizes a learned discriminator distance, which better represents the semantic distance on the data manifold. AF can significantly surpass Flow Matching in the no-guidance setting.

We also demonstrate end-to-end training of 56-layer and 112-layer 1NFE networks. No intermediate supervision. No teacher forcing. No manual timestep discretization. They surpass the 2NFE and 4NFE 28-layer counterparts, achieving a best FID of 1.94!