Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,77 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
pipeline_tag: text-to-image
|
| 4 |
+
library_name: diffusers
|
| 5 |
+
---
|
| 6 |
+
# AMD Nitro-E
|
| 7 |
+
|
| 8 |
+
|
| 9 |
+

|
| 10 |
+
|
| 11 |
+
## Introduction
|
| 12 |
+
Nitro-E is a family of text-to-image diffusion models focused on highly efficient training. With just 304M parameters, Nitro-E is designed to be resource-friendly for both training and inference. For training, it only takes 1.5 days on a single node with 8 AMD Instinct™ MI300X GPUs. On the inference side, Nitro-E delivers a throughput of 18.8 samples per second (batch size 32, 512px images) a single AMD Instinct MI300X GPU. The distilled version can further increase the throughput to 39.3 samples per second. The release consists of:
|
| 13 |
+
|
| 14 |
+
* [Nitro-E-512px](https://huggingface.co/amd/Nitro-E/blob/main/Nitro-E-512px.safetensors): a EMMDiT-based 20-steps model train from scratch.
|
| 15 |
+
* [Nitro-E-512px-dist](https://huggingface.co/amd/Nitro-E/blob/main/Nitro-E-512px-dist.safetensors): a EMMDiT-based model distilled from Nitro-E-512px.
|
| 16 |
+
* [Nitro-E-512px-GRPO](https://huggingface.co/amd/Nitro-E/tree/main/ckpt_grpo_512px): a post-training model fine-tuned from Nitro-E-512px using Group Relative Policy Optimization (GRPO) strategy.
|
| 17 |
+
|
| 18 |
+
⚡️ [Open-source code](https://github.com/AMD-AGI/Nitro-E)!
|
| 19 |
+
⚡️ [technical blog](https://advanced-micro-devices-rocm-blogs--1559.com.readthedocs.build/projects/internal/en/1559/artificial-intelligence/nitro-e/README.html)!
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
## Details
|
| 23 |
+
|
| 24 |
+
* **Model architecture**: We propose Efficient Multimodal Diffusion Transformer (E-MMDiT), an efficient and lightweight multimodal diffusion model with only 304M
|
| 25 |
+
parameters for fast image synthesis requiring low training resources. Our design philosophy centers on token reduction as the computational
|
| 26 |
+
cost scales significantly with the token count. We adopt a highly compressive visual tokenizer to produce a more compact representation and propose a novel multi-path compression
|
| 27 |
+
module for further compression of tokens. To enhance our design, we introduce Position Reinforcement, which strengthens positional information to maintain spatial coherence,
|
| 28 |
+
and Alternating Subregion Attention (ASA), which performs attention within subregions to further reduce computational cost. In addition, we propose AdaLN-affine, an
|
| 29 |
+
efficient lightweight module for computing modulation parameters in transformer blocks. See our technical blog post for more details.
|
| 30 |
+
* **Dataset**: Our models were trained on a dataset of ~25M images consisting of both real and synthetic data sources that are openly available on the internet. We make use of the following datasets for training: [Segment-Anything-1B](https://ai.meta.com/datasets/segment-anything/), [JourneyDB](https://journeydb.github.io/), [DiffusionDB](https://github.com/poloclub/diffusiondb) and [DataComp](https://huggingface.co/datasets/UCSC-VLAA/Recap-DataComp-1B) as prompt of the generated data.
|
| 31 |
+
* **Training cost**: The Nitro-E-512px model requires only 1.5 days of training from scratch on a single node with 8 AMD Instinct™ MI300X GPUs.
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
## Quickstart
|
| 35 |
+
* **Image generation with 20 steps**:
|
| 36 |
+
```python
|
| 37 |
+
import torch
|
| 38 |
+
from core.tools.inference_pipe import init_pipe
|
| 39 |
+
|
| 40 |
+
device = torch.device('cuda:0')
|
| 41 |
+
dtype = torch.bfloat16
|
| 42 |
+
resolution = 512
|
| 43 |
+
repo_name = "amd/Nitro-E"
|
| 44 |
+
ckpt_name = 'Nitro-E-512px.safetensors'
|
| 45 |
+
use_grpo = True
|
| 46 |
+
|
| 47 |
+
if use_grpo:
|
| 48 |
+
pipe = init_pipe(device, dtype, resolution, repo_name=repo_name, ckpt_name=ckpt_name, ckpt_path_grpo='ckpt_grpo_512px')
|
| 49 |
+
else:
|
| 50 |
+
pipe = init_pipe(device, dtype, resolution, repo_name=repo_name, ckpt_name=ckpt_name)
|
| 51 |
+
prompt = 'A hot air balloon in the shape of a heart grand canyon'
|
| 52 |
+
images = pipe(prompt=prompt, width=resolution, height=resolution, num_inference_steps=20, guidance_scale=4.5).images
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
* **Image generation with 4 steps**:
|
| 56 |
+
```python
|
| 57 |
+
import torch
|
| 58 |
+
from core.tools.inference_pipe import init_pipe
|
| 59 |
+
|
| 60 |
+
device = torch.device('cuda:0')
|
| 61 |
+
dtype = torch.bfloat16
|
| 62 |
+
resolution = 512
|
| 63 |
+
repo_name = "amd/Nitro-E"
|
| 64 |
+
ckpt_name = 'Nitro-E-512px-dist.safetensors'
|
| 65 |
+
|
| 66 |
+
pipe = init_pipe(device, dtype, resolution, repo_name=repo_name, ckpt_name=ckpt_name)
|
| 67 |
+
prompt = 'A hot air balloon in the shape of a heart grand canyon'
|
| 68 |
+
|
| 69 |
+
images = pipe(prompt=prompt, width=resolution, height=resolution, num_inference_steps=4, guidance_scale=0).images
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
|
| 73 |
+
## License
|
| 74 |
+
|
| 75 |
+
Copyright (c) 2025 Advanced Micro Devices, Inc. All Rights Reserved.
|
| 76 |
+
|
| 77 |
+
This project is licensed under the [MIT License](https://mit-license.org/).
|