Update README.md
Browse files
README.md
CHANGED
|
@@ -21,21 +21,19 @@ pipeline_tag: audio-text-to-text
|
|
| 21 |
|
| 22 |
# Voxtral Small 1.0 (24B) - 2507
|
| 23 |
|
| 24 |
-
Voxtral Small is an enhancement of [Mistral Small 3](https://huggingface.co/mistralai/Mistral-Small-24B-Base-2501), incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription and understanding.
|
| 25 |
|
| 26 |
Learn more about Voxtral in our blog post [here](https://mistral.ai/news/voxtral-2507).
|
| 27 |
|
| 28 |
-
Both Voxtral models go beyond transcription with capabilities that include:
|
| 29 |
-
|
| 30 |
-
|
| 31 |
## Key Features
|
| 32 |
|
| 33 |
Voxtral builds upon Mistral Small 3 with powerful audio understanding capabilities.
|
| 34 |
-
- **
|
| 35 |
-
- **
|
| 36 |
-
- **
|
| 37 |
-
- **
|
| 38 |
-
- **
|
|
|
|
| 39 |
|
| 40 |
## Benchmark Results
|
| 41 |
|
|
|
|
| 21 |
|
| 22 |
# Voxtral Small 1.0 (24B) - 2507
|
| 23 |
|
| 24 |
+
Voxtral Small is an enhancement of [Mistral Small 3](https://huggingface.co/mistralai/Mistral-Small-24B-Base-2501), incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding.
|
| 25 |
|
| 26 |
Learn more about Voxtral in our blog post [here](https://mistral.ai/news/voxtral-2507).
|
| 27 |
|
|
|
|
|
|
|
|
|
|
| 28 |
## Key Features
|
| 29 |
|
| 30 |
Voxtral builds upon Mistral Small 3 with powerful audio understanding capabilities.
|
| 31 |
+
- **Dedicated transcription mode**: Voxtral can operate in a pure speech transcription mode to maximize performance. By default, Voxtral automatically predicts the source audio language and transcribes the text accordingly
|
| 32 |
+
- **Long-form context**: With a 32k token context length, Voxtral handles audios up to 30 minutes for transcription, or 40 minutes for understanding
|
| 33 |
+
- **Built-in Q&A and summarization**: Supports asking questions directly through audio. Analyze audio and generate structured summaries without the need for separate ASR and language models
|
| 34 |
+
- **Natively multilingual**: Automatic language detection and state-of-the-art performance in the world’s most widely used languages (English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian)
|
| 35 |
+
- **Function-calling straight from voice**: Enables direct triggering of backend functions, workflows, or API calls based on spoken user intents
|
| 36 |
+
- **Highly capable at text**: Retains the text understanding capabilities of its language model backbone, Mistral Small 3.1
|
| 37 |
|
| 38 |
## Benchmark Results
|
| 39 |
|