patrickvonplaten commited on
Commit
f4172aa
·
verified ·
1 Parent(s): 78967c9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -63
README.md CHANGED
@@ -56,9 +56,11 @@ Voxtral builds upon Mistral Small 3 with powerful audio understanding capabiliti
56
  The model can be used with the following frameworks;
57
  - [`vllm (recommended)`](https://github.com/vllm-project/vllm): See [here](#vllm-recommended)
58
 
59
- **Note 1**: We recommend using a relatively low temperature, such as `temperature=0.15`.
60
 
61
- **Note 2**: Make sure to add a system prompt to the model to best tailor it to your needs.
 
 
62
 
63
  ### vLLM (recommended)
64
 
@@ -66,20 +68,34 @@ We recommend using this model with [vLLM](https://github.com/vllm-project/vllm).
66
 
67
  #### Installation
68
 
69
- Make sure to install [`vLLM >= 0.#.#`](https://github.com/vllm-project/vllm/releases/tag/v0.#.#):
70
 
71
  ```
72
- pip install vllm --upgrade
 
 
73
  ```
74
 
75
- Doing so should automatically install [`mistral_common >= 1.#.#`](https://github.com/mistralai/mistral-common/releases/tag/v1.#.#).
76
 
77
  To check:
78
  ```
79
  python -c "import mistral_common; print(mistral_common.__version__)"
80
  ```
81
 
82
- You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39).
 
 
 
 
 
 
 
 
 
 
 
 
83
 
84
  #### Serve
85
 
@@ -88,7 +104,7 @@ We recommend that you use Voxtral-Small-24B-2507 in a server/client setting.
88
  1. Spin up a server:
89
 
90
  ```
91
- vllm serve mistralai/Voxtral-Small-24B-2507 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2
92
  ```
93
 
94
  **Note:** Running Voxtral-Small-24B-2507 on GPU requires ~55 GB of GPU RAM in bf16 or fp16.
@@ -105,56 +121,6 @@ Leverage the audio capabilities of Voxtral-Small-24B-2507 to chat.
105
  <summary>Python snippet</summary>
106
 
107
  ```py
108
- TODO
109
- ```
110
- </details>
111
-
112
- #### Transcription
113
-
114
- Voxtral-Small-24B-2507 has powerfull transcription capabilities!
115
-
116
- <details>
117
- <summary>Python snippet</summary>
118
-
119
- ```python
120
- TODO
121
- ```
122
- </details>
123
-
124
- #### Function calling
125
-
126
- Voxtral-Small-24B-2507 is excellent at function / tool calling tasks via vLLM. *E.g.:*
127
-
128
- <details>
129
- <summary>Python snippet</summary>
130
-
131
- ```py
132
- ```
133
-
134
- </details>
135
-
136
- # ORIGINAL
137
-
138
- ```
139
- VLLM_USE_PRECOMPILED=1 pip install --editable .\[audio\]
140
- ```
141
-
142
- of: https://github.com/vllm-project/vllm/pull/20970#pullrequestreview-3019578541
143
-
144
- # Examples
145
-
146
- ## Client/Server
147
-
148
- ### Server
149
-
150
- ```sh
151
- vllm serve mistralai/voxtral-small --tokenizer_mode mistral --config_format mistral --load_format mistral --max_model_len 32768
152
- ```
153
-
154
- ### Client - Chat
155
-
156
- ```py
157
- #!/usr/bin/env python3
158
  from mistral_common.protocol.instruct.messages import TextChunk, AudioChunk, UserMessage, AssistantMessage, RawAudio
159
  from mistral_common.audio import Audio
160
  from huggingface_hub import hf_hub_download
@@ -163,7 +129,7 @@ from openai import OpenAI
163
 
164
  # Modify OpenAI's API key and API base to use vLLM's API server.
165
  openai_api_key = "EMPTY"
166
- openai_api_base = "http://slurm-h100-reserved-rno-199-087:8000/v1"
167
 
168
  client = OpenAI(
169
  api_key=openai_api_key,
@@ -220,10 +186,16 @@ content = response.choices[0].message.content
220
  print(30 * "=" + "BOT 2" + 30 * "=")
221
  print(content)
222
  ```
 
223
 
224
- ### Client - Transcribe
225
 
226
- ```py
 
 
 
 
 
227
  from mistral_common.protocol.transcription.request import TranscriptionRequest
228
  from mistral_common.protocol.instruct.messages import RawAudio
229
  from mistral_common.audio import Audio
@@ -233,7 +205,7 @@ from openai import OpenAI
233
 
234
  # Modify OpenAI's API key and API base to use vLLM's API server.
235
  openai_api_key = "EMPTY"
236
- openai_api_base = "http://slurm-h100-reserved-rno-199-087:8000/v1"
237
 
238
  client = OpenAI(
239
  api_key=openai_api_key,
@@ -252,5 +224,4 @@ req = TranscriptionRequest(model=model, audio=audio, language="en").to_openai(ex
252
  response = client.audio.transcriptions.create(**req)
253
  print(response)
254
  ```
255
-
256
-
 
56
  The model can be used with the following frameworks;
57
  - [`vllm (recommended)`](https://github.com/vllm-project/vllm): See [here](#vllm-recommended)
58
 
59
+ **Recommended settings**:
60
 
61
+ - `temperature=0.2` and `top_p=0.95` for chat completion (*e.g. Audio Understanding*) and `temperature=0.0` for transcription
62
+ - Multiple audios per message and multiple user turns with audio are supported
63
+ - System prompts are not yet supported
64
 
65
  ### vLLM (recommended)
66
 
 
68
 
69
  #### Installation
70
 
71
+ Make sure to install vllm from "main":
72
 
73
  ```
74
+ pip install -U vllm[audio] \
75
+ --pre \
76
+ --extra-index-url https://wheels.vllm.ai/nightly
77
  ```
78
 
79
+ Doing so should automatically install [`mistral_common >= 1.8.0`](https://github.com/mistralai/mistral-common/releases/tag/v1.8.0).
80
 
81
  To check:
82
  ```
83
  python -c "import mistral_common; print(mistral_common.__version__)"
84
  ```
85
 
86
+ #### Offline
87
+
88
+ You can test that your vLLM setup works as expected by cloning the vLLM repo:
89
+
90
+ ```sh
91
+ git clone https://github.com/vllm-project/vllm && cd vllm
92
+ ```
93
+
94
+ and then running:
95
+
96
+ ```sh
97
+ python examples/offline_inference/audio_language.py --num-audios 2 --model-type voxtral
98
+ ```
99
 
100
  #### Serve
101
 
 
104
  1. Spin up a server:
105
 
106
  ```
107
+ vllm serve mistralai/Voxtral-Small-24B-2507 --tokenizer_mode mistral --config_format mistral --load_format mistral --tensor-parallel-size 2
108
  ```
109
 
110
  **Note:** Running Voxtral-Small-24B-2507 on GPU requires ~55 GB of GPU RAM in bf16 or fp16.
 
121
  <summary>Python snippet</summary>
122
 
123
  ```py
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
124
  from mistral_common.protocol.instruct.messages import TextChunk, AudioChunk, UserMessage, AssistantMessage, RawAudio
125
  from mistral_common.audio import Audio
126
  from huggingface_hub import hf_hub_download
 
129
 
130
  # Modify OpenAI's API key and API base to use vLLM's API server.
131
  openai_api_key = "EMPTY"
132
+ openai_api_base = "http://<your-server-host>:8000/v1"
133
 
134
  client = OpenAI(
135
  api_key=openai_api_key,
 
186
  print(30 * "=" + "BOT 2" + 30 * "=")
187
  print(content)
188
  ```
189
+ </details>
190
 
191
+ #### Transcription
192
 
193
+ Voxtral-Small-24B-2507 has powerful transcription capabilities!
194
+
195
+ <details>
196
+ <summary>Python snippet</summary>
197
+
198
+ ```python
199
  from mistral_common.protocol.transcription.request import TranscriptionRequest
200
  from mistral_common.protocol.instruct.messages import RawAudio
201
  from mistral_common.audio import Audio
 
205
 
206
  # Modify OpenAI's API key and API base to use vLLM's API server.
207
  openai_api_key = "EMPTY"
208
+ openai_api_base = "http://<your-server-host>:8000/v1"
209
 
210
  client = OpenAI(
211
  api_key=openai_api_key,
 
224
  response = client.audio.transcriptions.create(**req)
225
  print(response)
226
  ```
227
+ </details>