alibaba-pai
/

Z-Image-Turbo-Fun-Controlnet-Union

Model card Files Files and versions

Z-Image-Turbo-Fun-Controlnet-Union / README.md

bubbliiiing's picture

Update README.md

01c794d verified 8 days ago

|

history blame contribute delete

3.23 kB

	---
	license: apache-2.0
	library_name: videox_fun
	---

	# Z-Image-Turbo-Fun-Controlnet-Union

	[![Github](https://img.shields.io/badge/🎬%20Code-Github-blue)](https://github.com/aigc-apps/VideoX-Fun)

	## News
	The new control model with more control blocks and inpaint mode is [released](https://huggingface.co/alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union-2.0).

	## Model Features
	- This ControlNet is added on 6 blocks.
	- The model was trained from scratch for 10,000 steps on a dataset of 1 million high-quality images covering both general and human-centric content. Training was performed at 1328 resolution using BFloat16 precision, with a batch size of 64, a learning rate of 2e-5, and a text dropout ratio of 0.10.
	- It supports multiple control conditions—including Canny, HED, Depth, Pose and MLSD can be used like a standard ControlNet.
	- You can adjust control_context_scale for stronger control and better detail preservation. For better stability, we highly recommend using a detailed prompt. The optimal range for control_context_scale is from 0.65 to 0.80.

	## TODO
	- [ ] Train on more data and for more steps.
	- [ ] Support inpaint mode.

	## Results

	<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
	<tr>
	<td>Pose</td>
	<td>Output</td>
	</tr>
	<tr>
	<td><img src="asset/pose2.jpg" width="100%" /></td>
	<td><img src="results/pose2.png" width="100%" /></td>
	</tr>
	</table>

	<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
	<tr>
	<td>Pose</td>
	<td>Output</td>
	</tr>
	<tr>
	<td><img src="asset/pose.jpg" width="100%" /></td>
	<td><img src="results/pose.png" width="100%" /></td>
	</tr>
	</table>

	<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
	<tr>
	<td>Canny</td>
	<td>Output</td>
	</tr>
	<tr>
	<td><img src="asset/canny.jpg" width="100%" /></td>
	<td><img src="results/canny.png" width="100%" /></td>
	</tr>
	</table>

	<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
	<tr>
	<td>HED</td>
	<td>Output</td>
	</tr>
	<tr>
	<td><img src="asset/hed.jpg" width="100%" /></td>
	<td><img src="results/hed.png" width="100%" /></td>
	</tr>
	</table>

	<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
	<tr>
	<td>Depth</td>
	<td>Output</td>
	</tr>
	<tr>
	<td><img src="asset/depth.jpg" width="100%" /></td>
	<td><img src="results/depth.png" width="100%" /></td>
	</tr>
	</table>

	## Inference
	Go to the VideoX-Fun repository for more details.

	Please clone the VideoX-Fun repository and create the required directories:

	```sh
	# Clone the code
	git clone https://github.com/aigc-apps/VideoX-Fun.git

	# Enter VideoX-Fun's directory
	cd VideoX-Fun

	# Create model directories
	mkdir -p models/Diffusion_Transformer
	mkdir -p models/Personalized_Model
	```

	Then download the weights into models/Diffusion_Transformer and models/Personalized_Model.

	```
	📦 models/
	├── 📂 Diffusion_Transformer/
	│ └── 📂 Z-Image-Turbo/
	├── 📂 Personalized_Model/
	│ └── 📦 Z-Image-Turbo-Fun-Controlnet-Union.safetensors
	```

	Then run the file `examples/z_image_fun/predict_t2i_control.py`.