xiongwang commited on
Commit
f4c283c
·
1 Parent(s): 6986a6a

Upload ckpt.

Browse files
Files changed (1) hide show
  1. README.md +15 -18
README.md CHANGED
@@ -56,19 +56,19 @@ This model card introduces a series of enhancements designed to improve the Qwen
56
 
57
  These improvements aim to ensure efficient performance of Qwen2.5-Omni across a range of hardware configurations, particularly those with lower GPU memory availability (RTX3080, 4080, 5070, etc).
58
 
59
- Below, we provide simple examples to show how to use Qwen2.5-Omni-7B-GPTQ-Int4 with `gptqmodel` as follows:
60
  ```
61
  pip uninstall transformers
62
  pip install git+https://github.com/huggingface/[email protected]
63
  pip install accelerate
64
  pip install gptqmodel==2.0.0
65
  pip install numpy==2.0.0
66
- ```
67
- and then
68
- ```
69
  git clone https://github.com/QwenLM/Qwen2.5-Omni.git
 
70
  cd low-VRAM-mode/
71
- python low_VRAM_demo_gptq.py
 
72
  ```
73
 
74
  We offer a toolkit to help you handle various types of audio and visual input more conveniently, as if you were using an API. This includes base64, URLs, and interleaved audio, images and videos. You can install it using the following command and make sure your system has `ffmpeg` installed:
@@ -81,27 +81,24 @@ pip install qwen-omni-utils[decord] -U
81
  If you are not using Linux, you might not be able to install `decord` from PyPI. In that case, you can use `pip install qwen-omni-utils -U` which will fall back to using torchvision for video processing. However, you can still [install decord from source](https://github.com/dmlc/decord?tab=readme-ov-file#install-from-source) to get decord used when loading video.
82
 
83
 
84
- ### Performance
 
 
85
 
86
  | Evaluation Set | Task | Metrics | Qwen2.5-Omni-7B | Qwen2.5-Omni-7B-GPTQ-Int4 |
87
  |--------------|-----------| ------------- | ------------- | ------------------ |
88
- | LibriSpeech test-other | ASR | WER ⬇️ | 3.4 | 3.71 |
89
- | WenetSpeech test-net | ASR | WER ⬇️ | 5.9 | 6.62 |
90
- | Seed-TTS test-hard (Speaker: Chelsie) | TTS | WER ⬇️ | 8.7 | 10.3 |
91
- | MMLU-Pro | Text -> Text | Accuracy ⬆️ | 47.0 | 43.76 |
92
- | OmniBench | Speech -> Text | Accuracy ⬆️ | 56.13 | 53.59 |
93
- | VideoMME | Multimodality -> Text | Accuracy ⬆️ | 72.4 | 68.0 |
94
-
95
-
96
-
97
- ### Minimum GPU memory requirements
98
 
99
  |Model | Precision | 15(s) Video | 30(s) Video | 60(s) Video |
100
  |--------------|-----------| ------------- | ------------- | ------------------ |
101
- | Qwen-Omni-3B | FP32 | 89.10 GB | Not Recommend | Not Recommend |
102
- | Qwen-Omni-3B | BF16 | 18.38 GB | 22.43 GB | 28.22 GB |
103
  | Qwen-Omni-7B | FP32 | 93.56 GB | Not Recommend | Not Recommend |
104
  | Qwen-Omni-7B | BF16 | 31.11 GB | 41.85 GB | 60.19 GB |
 
105
 
106
 
107
  ## Citation
 
56
 
57
  These improvements aim to ensure efficient performance of Qwen2.5-Omni across a range of hardware configurations, particularly those with lower GPU memory availability (RTX3080, 4080, 5070, etc).
58
 
59
+ Below, we provide simple example to show how to use Qwen2.5-Omni-7B-GPTQ-Int4 with `gptqmodel` as follows:
60
  ```
61
  pip uninstall transformers
62
  pip install git+https://github.com/huggingface/[email protected]
63
  pip install accelerate
64
  pip install gptqmodel==2.0.0
65
  pip install numpy==2.0.0
66
+
 
 
67
  git clone https://github.com/QwenLM/Qwen2.5-Omni.git
68
+
69
  cd low-VRAM-mode/
70
+
71
+ CUDA_VISIBLE_DEVICES=0 python3 low_VRAM_demo_gptq.py
72
  ```
73
 
74
  We offer a toolkit to help you handle various types of audio and visual input more conveniently, as if you were using an API. This includes base64, URLs, and interleaved audio, images and videos. You can install it using the following command and make sure your system has `ffmpeg` installed:
 
81
  If you are not using Linux, you might not be able to install `decord` from PyPI. In that case, you can use `pip install qwen-omni-utils -U` which will fall back to using torchvision for video processing. However, you can still [install decord from source](https://github.com/dmlc/decord?tab=readme-ov-file#install-from-source) to get decord used when loading video.
82
 
83
 
84
+ ### Performance and GPU memory requirements
85
+
86
+ The following two tables present a performance comparison and GPU memory consumption between Qwen2.5-Omni-7B-GPTQ-Int4 and Qwen2.5-Omni-7B on specific evaluation benchmarks. The data demonstrates that the GPTQ-Int4 model maintains comparable performance while reducing GPU memory requirements by over 50%+, enabling a broader range of devices to run and experience the high-performance Qwen2.5-Omni-7B model. Notably, the GPTQ-Int4 variant exhibits slightly slower inference speeds compared to the native Qwen2.5-Omni-7B model due to quantization techniques and CPU offload mechanisms.
87
 
88
  | Evaluation Set | Task | Metrics | Qwen2.5-Omni-7B | Qwen2.5-Omni-7B-GPTQ-Int4 |
89
  |--------------|-----------| ------------- | ------------- | ------------------ |
90
+ | LibriSpeech test-other | ASR | WER ⬇️ | 3.4 | 3.71 |
91
+ | WenetSpeech test-net | ASR | WER ⬇️ | 5.9 | 6.62 |
92
+ | Seed-TTS test-hard | TTS (Speaker: Chelsie)| WER ⬇️ | 8.7 | 10.3 |
93
+ | MMLU-Pro | Text -> Text | Accuracy ⬆️ | 47.0 | 43.76 |
94
+ | OmniBench | Speech -> Text | Accuracy ⬆️ | 56.13 | 53.59 |
95
+ | VideoMME | Multimodality -> Text | Accuracy ⬆️ | 72.4 | 68.0 |
 
 
 
 
96
 
97
  |Model | Precision | 15(s) Video | 30(s) Video | 60(s) Video |
98
  |--------------|-----------| ------------- | ------------- | ------------------ |
 
 
99
  | Qwen-Omni-7B | FP32 | 93.56 GB | Not Recommend | Not Recommend |
100
  | Qwen-Omni-7B | BF16 | 31.11 GB | 41.85 GB | 60.19 GB |
101
+ | Qwen-Omni-7B | GPTQ-Int4 | 11.64 GB | 17.43 GB | 29.51 GB |
102
 
103
 
104
  ## Citation