|
|
--- |
|
|
license: apache-2.0 |
|
|
library_name: videox_fun |
|
|
--- |
|
|
|
|
|
# Z-Image-Turbo-Fun-Controlnet-Union |
|
|
|
|
|
[](https://github.com/aigc-apps/VideoX-Fun) |
|
|
|
|
|
## News |
|
|
The new control model with more control blocks and inpaint mode is [released](https://huggingface.co/alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union-2.0). |
|
|
|
|
|
## Model Features |
|
|
- This ControlNet is added on 6 blocks. |
|
|
- The model was trained from scratch for 10,000 steps on a dataset of 1 million high-quality images covering both general and human-centric content. Training was performed at 1328 resolution using BFloat16 precision, with a batch size of 64, a learning rate of 2e-5, and a text dropout ratio of 0.10. |
|
|
- It supports multiple control conditionsβincluding Canny, HED, Depth, Pose and MLSD can be used like a standard ControlNet. |
|
|
- You can adjust control_context_scale for stronger control and better detail preservation. For better stability, we highly recommend using a detailed prompt. The optimal range for control_context_scale is from 0.65 to 0.80. |
|
|
|
|
|
## TODO |
|
|
- [ ] Train on more data and for more steps. |
|
|
- [ ] Support inpaint mode. |
|
|
|
|
|
## Results |
|
|
|
|
|
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;"> |
|
|
<tr> |
|
|
<td>Pose</td> |
|
|
<td>Output</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td><img src="asset/pose2.jpg" width="100%" /></td> |
|
|
<td><img src="results/pose2.png" width="100%" /></td> |
|
|
</tr> |
|
|
</table> |
|
|
|
|
|
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;"> |
|
|
<tr> |
|
|
<td>Pose</td> |
|
|
<td>Output</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td><img src="asset/pose.jpg" width="100%" /></td> |
|
|
<td><img src="results/pose.png" width="100%" /></td> |
|
|
</tr> |
|
|
</table> |
|
|
|
|
|
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;"> |
|
|
<tr> |
|
|
<td>Canny</td> |
|
|
<td>Output</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td><img src="asset/canny.jpg" width="100%" /></td> |
|
|
<td><img src="results/canny.png" width="100%" /></td> |
|
|
</tr> |
|
|
</table> |
|
|
|
|
|
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;"> |
|
|
<tr> |
|
|
<td>HED</td> |
|
|
<td>Output</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td><img src="asset/hed.jpg" width="100%" /></td> |
|
|
<td><img src="results/hed.png" width="100%" /></td> |
|
|
</tr> |
|
|
</table> |
|
|
|
|
|
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;"> |
|
|
<tr> |
|
|
<td>Depth</td> |
|
|
<td>Output</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td><img src="asset/depth.jpg" width="100%" /></td> |
|
|
<td><img src="results/depth.png" width="100%" /></td> |
|
|
</tr> |
|
|
</table> |
|
|
|
|
|
## Inference |
|
|
Go to the VideoX-Fun repository for more details. |
|
|
|
|
|
Please clone the VideoX-Fun repository and create the required directories: |
|
|
|
|
|
```sh |
|
|
# Clone the code |
|
|
git clone https://github.com/aigc-apps/VideoX-Fun.git |
|
|
|
|
|
# Enter VideoX-Fun's directory |
|
|
cd VideoX-Fun |
|
|
|
|
|
# Create model directories |
|
|
mkdir -p models/Diffusion_Transformer |
|
|
mkdir -p models/Personalized_Model |
|
|
``` |
|
|
|
|
|
Then download the weights into models/Diffusion_Transformer and models/Personalized_Model. |
|
|
|
|
|
``` |
|
|
π¦ models/ |
|
|
βββ π Diffusion_Transformer/ |
|
|
β βββ π Z-Image-Turbo/ |
|
|
βββ π Personalized_Model/ |
|
|
β βββ π¦ Z-Image-Turbo-Fun-Controlnet-Union.safetensors |
|
|
``` |
|
|
|
|
|
Then run the file `examples/z_image_fun/predict_t2i_control.py`. |