Qwen
/

Qwen2.5-Omni-7B-GPTQ-Int4

@@ -56,19 +56,19 @@ This model card introduces a series of enhancements designed to improve the Qwen
 These improvements aim to ensure efficient performance of Qwen2.5-Omni across a range of hardware configurations, particularly those with lower GPU memory availability (RTX3080, 4080, 5070, etc).
-Below, we provide simple examples to show how to use Qwen2.5-Omni-7B-GPTQ-Int4 with `gptqmodel` as follows:
 ```
 pip uninstall transformers
 pip install git+https://github.com/huggingface/[email protected]
 pip install accelerate
 pip install gptqmodel==2.0.0
 pip install numpy==2.0.0
-```
-and then
-```
 git clone https://github.com/QwenLM/Qwen2.5-Omni.git
 cd low-VRAM-mode/
-python low_VRAM_demo_gptq.py
 ```
 We offer a toolkit to help you handle various types of audio and visual input more conveniently, as if you were using an API. This includes base64, URLs, and interleaved audio, images and videos. You can install it using the following command and make sure your system has `ffmpeg` installed:
@@ -81,27 +81,24 @@ pip install qwen-omni-utils[decord] -U
 If you are not using Linux, you might not be able to install `decord` from PyPI. In that case, you can use `pip install qwen-omni-utils -U` which will fall back to using torchvision for video processing. However, you can still [install decord from source](https://github.com/dmlc/decord?tab=readme-ov-file#install-from-source) to get decord used when loading video.
-### Performance
 | Evaluation Set | Task | Metrics | Qwen2.5-Omni-7B | Qwen2.5-Omni-7B-GPTQ-Int4 |
 |--------------|-----------| ------------- | ------------- | ------------------ |
-| LibriSpeech test-other                | ASR                   | WER ⬇️      | 3.4   | 3.71  |
-| WenetSpeech test-net                  | ASR                   | WER ⬇️      | 5.9   | 6.62  |
-| Seed-TTS test-hard (Speaker: Chelsie) | TTS                   | WER ⬇️      | 8.7   | 10.3  |
-| MMLU-Pro                              | Text -> Text          | Accuracy ⬆️ | 47.0  | 43.76 |
-| OmniBench                             | Speech -> Text        | Accuracy ⬆️ | 56.13 | 53.59 |
-| VideoMME                              | Multimodality -> Text | Accuracy ⬆️ | 72.4  | 68.0  |
-### Minimum GPU memory requirements
 |Model | Precision | 15(s) Video | 30(s) Video | 60(s) Video |
 |--------------|-----------| ------------- | ------------- | ------------------ |
-| Qwen-Omni-3B | FP32      | 89.10 GB      | Not Recommend | Not Recommend      |
-| Qwen-Omni-3B | BF16      | 18.38 GB      | 22.43 GB      | 28.22 GB           |
 | Qwen-Omni-7B | FP32      | 93.56 GB      | Not Recommend | Not Recommend      |
 | Qwen-Omni-7B | BF16      | 31.11 GB      | 41.85 GB      | 60.19 GB           |
 ## Citation

 These improvements aim to ensure efficient performance of Qwen2.5-Omni across a range of hardware configurations, particularly those with lower GPU memory availability (RTX3080, 4080, 5070, etc).
+Below, we provide simple example to show how to use Qwen2.5-Omni-7B-GPTQ-Int4 with `gptqmodel` as follows:
 ```
 pip uninstall transformers
 pip install git+https://github.com/huggingface/[email protected]
 pip install accelerate
 pip install gptqmodel==2.0.0
 pip install numpy==2.0.0
 git clone https://github.com/QwenLM/Qwen2.5-Omni.git
 cd low-VRAM-mode/
+CUDA_VISIBLE_DEVICES=0 python3 low_VRAM_demo_gptq.py
 ```
 We offer a toolkit to help you handle various types of audio and visual input more conveniently, as if you were using an API. This includes base64, URLs, and interleaved audio, images and videos. You can install it using the following command and make sure your system has `ffmpeg` installed:
 If you are not using Linux, you might not be able to install `decord` from PyPI. In that case, you can use `pip install qwen-omni-utils -U` which will fall back to using torchvision for video processing. However, you can still [install decord from source](https://github.com/dmlc/decord?tab=readme-ov-file#install-from-source) to get decord used when loading video.
+### Performance and GPU memory requirements
+The following two tables present a performance comparison and GPU memory consumption between Qwen2.5-Omni-7B-GPTQ-Int4 and Qwen2.5-Omni-7B on specific evaluation benchmarks. The data demonstrates that the GPTQ-Int4 model maintains comparable performance while reducing GPU memory requirements by over 50%+, enabling a broader range of devices to run and experience the high-performance Qwen2.5-Omni-7B model. Notably, the GPTQ-Int4 variant exhibits slightly slower inference speeds compared to the native Qwen2.5-Omni-7B model due to quantization techniques and CPU offload mechanisms.
 | Evaluation Set | Task | Metrics | Qwen2.5-Omni-7B | Qwen2.5-Omni-7B-GPTQ-Int4 |
 |--------------|-----------| ------------- | ------------- | ------------------ |
+| LibriSpeech test-other   | ASR                   | WER ⬇️      | 3.4   | 3.71  |
+| WenetSpeech test-net     | ASR                   | WER ⬇️      | 5.9   | 6.62  |
+| Seed-TTS test-hard       | TTS (Speaker: Chelsie)| WER ⬇️      | 8.7   | 10.3  |
+| MMLU-Pro                 | Text -> Text          | Accuracy ⬆️ | 47.0  | 43.76 |
+| OmniBench                | Speech -> Text        | Accuracy ⬆️ | 56.13 | 53.59 |
+| VideoMME                 | Multimodality -> Text | Accuracy ⬆️ | 72.4  | 68.0  |
 |Model | Precision | 15(s) Video | 30(s) Video | 60(s) Video |
 |--------------|-----------| ------------- | ------------- | ------------------ |
 | Qwen-Omni-7B | FP32      | 93.56 GB      | Not Recommend | Not Recommend      |
 | Qwen-Omni-7B | BF16      | 31.11 GB      | 41.85 GB      | 60.19 GB           |
+| Qwen-Omni-7B | GPTQ-Int4 | 11.64 GB      | 17.43 GB      | 29.51 GB           |
 ## Citation