π§ VocalNet-Qwen3-1.7B Model Card
VocalNet-Qwen3-1.7B is a high-performance, low-latency speech large language model (LLM) capable of both English and Mandarin, optimized for real-time voice interaction.
The official repo for model training and inference will be open-sourced as soon as possible.
π VocalBench Performance
| Model | Knowledge | Reasoning | Creativity | UTMOS | WER | Single-Round | Multi-Round | Instruction Following | Emotional Empathy | Safety | Robust | Overall |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mini-Omni (0.5B) | 2.20 | 1.291 | 1.4725 | 4.435 | 19.571 | 1.645 | - | 0.00 | 5.428 | 81.25 | 84.14 | 40.646 |
| Mini-Omni2 (0.5B) | 4.65 | 1.501 | 1.8025 | 4.413 | 36.269 | 1.915 | - | 0.11 | 5.709 | 88.50 | 82.26 | 43.224 |
| SLAM-Omni (0.5B) | 12.05 | 1.875 | 2.5175 | 4.424 | 6.065 | 2.880 | 1.9800 | 3.11 | 6.452 | 90.25 | 77.91 | 54.649 |
| VocalNet-1B (1B) | 43.00 | 2.869 | 3.1800 | 4.437 | 5.123 | 3.335 | 3.2550 | 16.11 | 6.754 | 89.00 | 92.42 | 66.632 |
| VocalNet-Qwen3-1.7B (1.7B) | 45.65 | 3.712 | 3.3625 | 4.353 | 1.775 | 3.450 | 3.6325 | 31.89 | 7.000 | 82.75 | 91.47 | 72.152 |
- Downloads last month
- 18
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support