Upload ckpt.
Browse files
README.md
CHANGED
|
@@ -56,19 +56,19 @@ This model card introduces a series of enhancements designed to improve the Qwen
|
|
| 56 |
|
| 57 |
These improvements aim to ensure efficient performance of Qwen2.5-Omni across a range of hardware configurations, particularly those with lower GPU memory availability (RTX3080, 4080, 5070, etc).
|
| 58 |
|
| 59 |
-
Below, we provide simple
|
| 60 |
```
|
| 61 |
pip uninstall transformers
|
| 62 |
pip install git+https://github.com/huggingface/[email protected]
|
| 63 |
pip install accelerate
|
| 64 |
pip install gptqmodel==2.0.0
|
| 65 |
pip install numpy==2.0.0
|
| 66 |
-
|
| 67 |
-
and then
|
| 68 |
-
```
|
| 69 |
git clone https://github.com/QwenLM/Qwen2.5-Omni.git
|
|
|
|
| 70 |
cd low-VRAM-mode/
|
| 71 |
-
|
|
|
|
| 72 |
```
|
| 73 |
|
| 74 |
We offer a toolkit to help you handle various types of audio and visual input more conveniently, as if you were using an API. This includes base64, URLs, and interleaved audio, images and videos. You can install it using the following command and make sure your system has `ffmpeg` installed:
|
|
@@ -81,27 +81,24 @@ pip install qwen-omni-utils[decord] -U
|
|
| 81 |
If you are not using Linux, you might not be able to install `decord` from PyPI. In that case, you can use `pip install qwen-omni-utils -U` which will fall back to using torchvision for video processing. However, you can still [install decord from source](https://github.com/dmlc/decord?tab=readme-ov-file#install-from-source) to get decord used when loading video.
|
| 82 |
|
| 83 |
|
| 84 |
-
### Performance
|
|
|
|
|
|
|
| 85 |
|
| 86 |
| Evaluation Set | Task | Metrics | Qwen2.5-Omni-7B | Qwen2.5-Omni-7B-GPTQ-Int4 |
|
| 87 |
|--------------|-----------| ------------- | ------------- | ------------------ |
|
| 88 |
-
| LibriSpeech test-other
|
| 89 |
-
| WenetSpeech test-net
|
| 90 |
-
| Seed-TTS test-hard (Speaker: Chelsie)
|
| 91 |
-
| MMLU-Pro
|
| 92 |
-
| OmniBench
|
| 93 |
-
| VideoMME
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
### Minimum GPU memory requirements
|
| 98 |
|
| 99 |
|Model | Precision | 15(s) Video | 30(s) Video | 60(s) Video |
|
| 100 |
|--------------|-----------| ------------- | ------------- | ------------------ |
|
| 101 |
-
| Qwen-Omni-3B | FP32 | 89.10 GB | Not Recommend | Not Recommend |
|
| 102 |
-
| Qwen-Omni-3B | BF16 | 18.38 GB | 22.43 GB | 28.22 GB |
|
| 103 |
| Qwen-Omni-7B | FP32 | 93.56 GB | Not Recommend | Not Recommend |
|
| 104 |
| Qwen-Omni-7B | BF16 | 31.11 GB | 41.85 GB | 60.19 GB |
|
|
|
|
| 105 |
|
| 106 |
|
| 107 |
## Citation
|
|
|
|
| 56 |
|
| 57 |
These improvements aim to ensure efficient performance of Qwen2.5-Omni across a range of hardware configurations, particularly those with lower GPU memory availability (RTX3080, 4080, 5070, etc).
|
| 58 |
|
| 59 |
+
Below, we provide simple example to show how to use Qwen2.5-Omni-7B-GPTQ-Int4 with `gptqmodel` as follows:
|
| 60 |
```
|
| 61 |
pip uninstall transformers
|
| 62 |
pip install git+https://github.com/huggingface/[email protected]
|
| 63 |
pip install accelerate
|
| 64 |
pip install gptqmodel==2.0.0
|
| 65 |
pip install numpy==2.0.0
|
| 66 |
+
|
|
|
|
|
|
|
| 67 |
git clone https://github.com/QwenLM/Qwen2.5-Omni.git
|
| 68 |
+
|
| 69 |
cd low-VRAM-mode/
|
| 70 |
+
|
| 71 |
+
CUDA_VISIBLE_DEVICES=0 python3 low_VRAM_demo_gptq.py
|
| 72 |
```
|
| 73 |
|
| 74 |
We offer a toolkit to help you handle various types of audio and visual input more conveniently, as if you were using an API. This includes base64, URLs, and interleaved audio, images and videos. You can install it using the following command and make sure your system has `ffmpeg` installed:
|
|
|
|
| 81 |
If you are not using Linux, you might not be able to install `decord` from PyPI. In that case, you can use `pip install qwen-omni-utils -U` which will fall back to using torchvision for video processing. However, you can still [install decord from source](https://github.com/dmlc/decord?tab=readme-ov-file#install-from-source) to get decord used when loading video.
|
| 82 |
|
| 83 |
|
| 84 |
+
### Performance and GPU memory requirements
|
| 85 |
+
|
| 86 |
+
The following two tables present a performance comparison and GPU memory consumption between Qwen2.5-Omni-7B-GPTQ-Int4 and Qwen2.5-Omni-7B on specific evaluation benchmarks. The data demonstrates that the GPTQ-Int4 model maintains comparable performance while reducing GPU memory requirements by over 50%+, enabling a broader range of devices to run and experience the high-performance Qwen2.5-Omni-7B model. Notably, the GPTQ-Int4 variant exhibits slightly slower inference speeds compared to the native Qwen2.5-Omni-7B model due to quantization techniques and CPU offload mechanisms.
|
| 87 |
|
| 88 |
| Evaluation Set | Task | Metrics | Qwen2.5-Omni-7B | Qwen2.5-Omni-7B-GPTQ-Int4 |
|
| 89 |
|--------------|-----------| ------------- | ------------- | ------------------ |
|
| 90 |
+
| LibriSpeech test-other | ASR | WER ⬇️ | 3.4 | 3.71 |
|
| 91 |
+
| WenetSpeech test-net | ASR | WER ⬇️ | 5.9 | 6.62 |
|
| 92 |
+
| Seed-TTS test-hard | TTS (Speaker: Chelsie)| WER ⬇️ | 8.7 | 10.3 |
|
| 93 |
+
| MMLU-Pro | Text -> Text | Accuracy ⬆️ | 47.0 | 43.76 |
|
| 94 |
+
| OmniBench | Speech -> Text | Accuracy ⬆️ | 56.13 | 53.59 |
|
| 95 |
+
| VideoMME | Multimodality -> Text | Accuracy ⬆️ | 72.4 | 68.0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 96 |
|
| 97 |
|Model | Precision | 15(s) Video | 30(s) Video | 60(s) Video |
|
| 98 |
|--------------|-----------| ------------- | ------------- | ------------------ |
|
|
|
|
|
|
|
| 99 |
| Qwen-Omni-7B | FP32 | 93.56 GB | Not Recommend | Not Recommend |
|
| 100 |
| Qwen-Omni-7B | BF16 | 31.11 GB | 41.85 GB | 60.19 GB |
|
| 101 |
+
| Qwen-Omni-7B | GPTQ-Int4 | 11.64 GB | 17.43 GB | 29.51 GB |
|
| 102 |
|
| 103 |
|
| 104 |
## Citation
|