Chat Template Logic Issue: Ambiguous Default Thinking Mode
Problem
When the thinking variable is not explicitly defined by the client, the template defaults to thinking mode ON, causing OpenAI-compatible clients to be unable to distinguish reasoning_content from regular content.
Root Cause
In the add_generation_prompt section:
{%- if thinking is defined and thinking is false -%}
<think></think>
{%- else -%}
<think>
{%- endif -%}
- When
thinking=false: outputs<think></think>β - When
thinking=trueOR undefined: outputs<think>(unclosed) β
When thinking is undefined:
- Template ends with
<think> - Model generates:
reasoning_content</think>actual_content - Client receives merged output but cannot determine if thinking was enabled
</think>appears concatenated with content without clear separation
Fix
Invert the default behavior to thinking OFF when undefined:
-{%- if thinking is defined and thinking is false -%}
-<think></think>
-{%- else -%}
+{%- if thinking is defined and thinking is true -%}
<think>
+{%- else -%}
+<think></think>
{%- endif -%}
Behavior After Fix
thinking value |
Output | Meaning |
|---|---|---|
true |
<think> |
Expect model to generate reasoning |
false |
<think></think> |
No reasoning |
| undefined | <think></think> |
No reasoning (safe default) |
This ensures clients can reliably parse responses by checking for <think></think> (thinking off) vs <think>content...</think> (thinking on).
Hi, we set thinking mode as default to make it compatible with our official API behavior. The problem you describe is likely a bug in the reasoning parser. For example, sglang fixed a similar issue recently: https://github.com/sgl-project/sglang/pull/17901
It seems to be true. π€―
That's not problem. but suffix index seems wrong.
current chat_template includes:
{# split all messages into history & suffix, reasoning_content in suffix should be reserved.#}
{%- set hist_msgs = messages[:ns.last_non_tool_call_assistant_msg+1] -%}
{%- set suffix_msgs = messages[ns.last_non_tool_call_assistant_msg+1:] -%}
But it should be
{# split all messages into history & suffix, reasoning_content in suffix should be reserved.#}
{%- set hist_msgs = messages[:ns.last_non_tool_call_assistant_msg] -%}
{%- set suffix_msgs = messages[ns.last_non_tool_call_assistant_msg:] -%}
There is another related issue with the hist/suffix split: when all assistant messages have tool_calls (common in multi-turn tool-call conversations), last_non_tool_call_assistant_msg stays at -1, causing all messages to become suffix_msgs. This makes reasoning_content from every historical turn accumulate in the prompt, eventually causing the model to degenerate into repetitive output after ~10-18 rounds.
Fix: add a fallback after the existing loop β when no non-tool-call assistant is found, split at the last assistant message:
{%- if ns.last_non_tool_call_assistant_msg == -1 -%}
{%- for idx in range(messages|length-1, -1, -1) -%}
{%- if messages[idx]['role'] == 'assistant' -%}
{%- set ns.last_non_tool_call_assistant_msg = idx - 1 -%}
{%- break -%}
{%- endif -%}
{%- endfor -%}
{%- endif -%}
This ensures only the latest turn's reasoning_content is preserved, while older turns get (cleared), matching the template's intended behavior for hist_msgs.
That's not problem. but suffix index seems wrong.
current chat_template includes:
{# split all messages into history & suffix, reasoning_content in suffix should be reserved.#} {%- set hist_msgs = messages[:ns.last_non_tool_call_assistant_msg+1] -%} {%- set suffix_msgs = messages[ns.last_non_tool_call_assistant_msg+1:] -%}But it should be
{# split all messages into history & suffix, reasoning_content in suffix should be reserved.#} {%- set hist_msgs = messages[:ns.last_non_tool_call_assistant_msg] -%} {%- set suffix_msgs = messages[ns.last_non_tool_call_assistant_msg:] -%}
Could you clarify where the bug is?
- An assistant message without tool calls marks the end of a conversation turn and should be part of the history messages.
There is another related issue with the hist/suffix split: when all assistant messages have tool_calls (common in multi-turn tool-call conversations), last_non_tool_call_assistant_msg stays at -1, causing all messages to become suffix_msgs. This makes reasoning_content from every historical turn accumulate in the prompt, eventually causing the model to degenerate into repetitive output after ~10-18 rounds.
Fix: add a fallback after the existing loop β when no non-tool-call assistant is found, split at the last assistant message:
{%- if ns.last_non_tool_call_assistant_msg == -1 -%}
{%- for idx in range(messages|length-1, -1, -1) -%}
{%- if messages[idx]['role'] == 'assistant' -%}
{%- set ns.last_non_tool_call_assistant_msg = idx - 1 -%}
{%- break -%}
{%- endif -%}
{%- endfor -%}
{%- endif -%}
This ensures only the latest turn's reasoning_content is preserved, while older turns get (cleared), matching the template's intended behavior for hist_msgs.
When every assistant message contains tool calls, the multi-step conversation is continuing, and retaining all thinking content is the intended behavior by design.
I'm not sure if an excessive amount of thinking content in the prompt would degrade model performance (would it?), and I'm uncertain whether setting a maximum limit on the number of reserved thinking content entries would help. You're welcome to experiment with such limits, but preserving the complete thinking content for all tool calls remains the expected behavior.