Reasoning not parsing correctly within VLLM

by SuperbEmphasis - opened 5 days ago

5 days ago

I have verified Mistral Common 1.8.6 is installed. I am running using VLLM 0.12.0 in kubernetes.

However the response is going into the thinking/reasoning.

I also set the flags from the repo exactly:

(APIServer pid=44) INFO 12-03 11:40:29 [entrypoints/utils.py:253] non-default args: {'model_tag': '/models/Ministral-3-14B-Reasoning-2512', 'enable_auto_tool_choice': True, 'tool_call_parser': 'mistral', 'model': '/models/Ministral-3-14B-Reasoning-2512', 'tokenizer_mode': 'mistral', 'trust_remote_code': True, 'max_model_len': 65536, 'served_model_name': ['Ministral-3-14B-Reasoning-2512'], 'config_format': 'mistral', 'load_format': 'mistral', 'reasoning_parser': 'mistral'}

pandora-s

Mistral AI_ org 4 days ago

Hi there, are you using the default system prompt we recommend for reasoning?

pandora-s

Mistral AI_ org 4 days ago

Here: https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512/blob/main/SYSTEM_PROMPT.txt

engchina

4 days ago

HI pandora-s,

can we use vllm serve to use the text in SYSTEM_PROMPT.txt?
or should me send the text SYSTEM_PROMPT.txt in the client everytime?

SuperbEmphasis

4 days ago

@pandora-s
Yes I have tried, although I find it odd that when using chat completions, it isnt using the chat template automatically with that set?

I grabbed the default system prompt from the jinja template (Thank you guys for supplying that btw...), and I tried to copy/paste exatly what is in the .txt file you provided, however the thinking still seems to be... off....

The model seems to be starting with a double quotation mark, and still thinking anyways. The ending [/THINK] is also not being handled correctly...

But ultimately, it would make more sense to be able to host the model via vLLM in an environment where an end user might be using the API. It seems odd that I would need to tell them to use a specific prompt for it to work correctly? Thanks for the response!

juliendenize

Mistral AI_ org 4 days ago

•

edited 4 days ago

hey 😊 you're not supposed to pass the system prompt as a block of text
https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512#usage-of-the-model
could you take a look at the examples and try it out by sending think chunks please ?

Also we do not use chat templates with vLLM, the chat template is here for Transformers integration and integrations in libraries that do not support our processing library mistral-common.

It seems odd that I would need to tell them to use a specific prompt for it to work correctly?

our reasoning models are for now sensitive to the system prompt. Regarding your issue, is it possible for you to add in the beginning of your end user's messages a system prompt ? That way you don't have to ask them to add it.

bnjmnmarie

3 days ago

•

edited 3 days ago

That's very cumbersome tbh
There should be some good user guide on how to set this properly in vllm.
I'm sure people won't even notice that reasoning is not set properly and will just complain on X the model is bad.

I'm still not sure how it's supposed to work.

bnjmnmarie

3 days ago

or maybe it only works in the streaming mode for now?
when I disable streaming, "content" remains null, I have reasoning and reasoning_content containing the same text.

ChatCompletion(id='chatcmpl-99f7f2fc6a7ae15a', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[], reasoning='Okay, I need to use the numbers 2, 5, 6, 3 exactly once, along with the operations +, -, ×, ÷, and parentheses, to make the number 24. Let me start by thinking about possible combinations.\n\nFirst, I\'ll list the numbers: 2, 5, 6, 3.\n\nI need to combine these with operations to get 24. Maybe I can start by trying to multiply two numbers and then add or subtract the others.\n\nLet me try multiplying 6 and 4, but 4 isn\'t there. Maybe 6 × 4 is not directly possible. Wait, perhaps 6 × 3 = 18. Then I have 2 and 5 left. 18 + 5 + 2 = 25, which is not 24. Maybe 18 + 5 - 2 = 21, not 24. Hmm.\n\nWhat if I do 6 × 3 = 18, then 18 + 5 = 23, and 23 + 2 = 25. Not 24.\n\nMaybe I should try dividing. Let\'s see, 24 is divisible by 6, so perhaps 24 / 6 = 4. But how to g...
`

whoisjiji

3 days ago

•

edited 3 days ago

Same issue.
Even using exactly the same code snippet from https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512#usage-of-the-model , except setting streaming=False results in an empty "content" field. stream.choices[0].message.content is None and everything goes to reasoning_content and you can't separate reasoning from answer.
I checked with temp=0.0, if you just set streaming=False the text after reasoning block is just not available in the response. The reasoning part is the same for streaming on/off.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment