ascust commited on
Commit
402a89f
·
verified ·
1 Parent(s): 82cb97e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -3
README.md CHANGED
@@ -1,3 +1,77 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: text-to-image
4
+ library_name: diffusers
5
+ ---
6
+ # AMD Nitro-E
7
+
8
+
9
+ ![image/png](https://huggingface.co/amd/Nitro-E/resolve/main/assets/teaser.png)
10
+
11
+ ## Introduction
12
+ Nitro-E is a family of text-to-image diffusion models focused on highly efficient training. With just 304M parameters, Nitro-E is designed to be resource-friendly for both training and inference. For training, it only takes 1.5 days on a single node with 8 AMD Instinct™ MI300X GPUs. On the inference side, Nitro-E delivers a throughput of 18.8 samples per second (batch size 32, 512px images) a single AMD Instinct MI300X GPU. The distilled version can further increase the throughput to 39.3 samples per second. The release consists of:
13
+
14
+ * [Nitro-E-512px](https://huggingface.co/amd/Nitro-E/blob/main/Nitro-E-512px.safetensors): a EMMDiT-based 20-steps model train from scratch.
15
+ * [Nitro-E-512px-dist](https://huggingface.co/amd/Nitro-E/blob/main/Nitro-E-512px-dist.safetensors): a EMMDiT-based model distilled from Nitro-E-512px.
16
+ * [Nitro-E-512px-GRPO](https://huggingface.co/amd/Nitro-E/tree/main/ckpt_grpo_512px): a post-training model fine-tuned from Nitro-E-512px using Group Relative Policy Optimization (GRPO) strategy.
17
+
18
+ ⚡️ [Open-source code](https://github.com/AMD-AGI/Nitro-E)!
19
+ ⚡️ [technical blog](https://advanced-micro-devices-rocm-blogs--1559.com.readthedocs.build/projects/internal/en/1559/artificial-intelligence/nitro-e/README.html)!
20
+
21
+
22
+ ## Details
23
+
24
+ * **Model architecture**: We propose Efficient Multimodal Diffusion Transformer (E-MMDiT), an efficient and lightweight multimodal diffusion model with only 304M
25
+ parameters for fast image synthesis requiring low training resources. Our design philosophy centers on token reduction as the computational
26
+ cost scales significantly with the token count. We adopt a highly compressive visual tokenizer to produce a more compact representation and propose a novel multi-path compression
27
+ module for further compression of tokens. To enhance our design, we introduce Position Reinforcement, which strengthens positional information to maintain spatial coherence,
28
+ and Alternating Subregion Attention (ASA), which performs attention within subregions to further reduce computational cost. In addition, we propose AdaLN-affine, an
29
+ efficient lightweight module for computing modulation parameters in transformer blocks. See our technical blog post for more details.
30
+ * **Dataset**: Our models were trained on a dataset of ~25M images consisting of both real and synthetic data sources that are openly available on the internet. We make use of the following datasets for training: [Segment-Anything-1B](https://ai.meta.com/datasets/segment-anything/), [JourneyDB](https://journeydb.github.io/), [DiffusionDB](https://github.com/poloclub/diffusiondb) and [DataComp](https://huggingface.co/datasets/UCSC-VLAA/Recap-DataComp-1B) as prompt of the generated data.
31
+ * **Training cost**: The Nitro-E-512px model requires only 1.5 days of training from scratch on a single node with 8 AMD Instinct™ MI300X GPUs.
32
+
33
+
34
+ ## Quickstart
35
+ * **Image generation with 20 steps**:
36
+ ```python
37
+ import torch
38
+ from core.tools.inference_pipe import init_pipe
39
+
40
+ device = torch.device('cuda:0')
41
+ dtype = torch.bfloat16
42
+ resolution = 512
43
+ repo_name = "amd/Nitro-E"
44
+ ckpt_name = 'Nitro-E-512px.safetensors'
45
+ use_grpo = True
46
+
47
+ if use_grpo:
48
+ pipe = init_pipe(device, dtype, resolution, repo_name=repo_name, ckpt_name=ckpt_name, ckpt_path_grpo='ckpt_grpo_512px')
49
+ else:
50
+ pipe = init_pipe(device, dtype, resolution, repo_name=repo_name, ckpt_name=ckpt_name)
51
+ prompt = 'A hot air balloon in the shape of a heart grand canyon'
52
+ images = pipe(prompt=prompt, width=resolution, height=resolution, num_inference_steps=20, guidance_scale=4.5).images
53
+ ```
54
+
55
+ * **Image generation with 4 steps**:
56
+ ```python
57
+ import torch
58
+ from core.tools.inference_pipe import init_pipe
59
+
60
+ device = torch.device('cuda:0')
61
+ dtype = torch.bfloat16
62
+ resolution = 512
63
+ repo_name = "amd/Nitro-E"
64
+ ckpt_name = 'Nitro-E-512px-dist.safetensors'
65
+
66
+ pipe = init_pipe(device, dtype, resolution, repo_name=repo_name, ckpt_name=ckpt_name)
67
+ prompt = 'A hot air balloon in the shape of a heart grand canyon'
68
+
69
+ images = pipe(prompt=prompt, width=resolution, height=resolution, num_inference_steps=4, guidance_scale=0).images
70
+ ```
71
+
72
+
73
+ ## License
74
+
75
+ Copyright (c) 2025 Advanced Micro Devices, Inc. All Rights Reserved.
76
+
77
+ This project is licensed under the [MIT License](https://mit-license.org/).