Chat Template Logic Issue: Ambiguous Default Thinking Mode

#27
by QIN2DIM - opened

Problem

When the thinking variable is not explicitly defined by the client, the template defaults to thinking mode ON, causing OpenAI-compatible clients to be unable to distinguish reasoning_content from regular content.

Root Cause

In the add_generation_prompt section:

{%- if thinking is defined and thinking is false -%}
<think></think>
{%- else -%}
<think>
{%- endif -%}
  • When thinking=false: outputs <think></think> βœ“
  • When thinking=true OR undefined: outputs <think> (unclosed) βœ—

When thinking is undefined:

  1. Template ends with <think>
  2. Model generates: reasoning_content</think>actual_content
  3. Client receives merged output but cannot determine if thinking was enabled
  4. </think> appears concatenated with content without clear separation

Fix

Invert the default behavior to thinking OFF when undefined:

-{%- if thinking is defined and thinking is false -%}
-<think></think>
-{%- else -%}
+{%- if thinking is defined and thinking is true -%}
 <think>
+{%- else -%}
+<think></think>
 {%- endif -%}

Behavior After Fix

thinking value Output Meaning
true <think> Expect model to generate reasoning
false <think></think> No reasoning
undefined <think></think> No reasoning (safe default)

This ensures clients can reliably parse responses by checking for <think></think> (thinking off) vs <think>content...</think> (thinking on).

Moonshot AI org

Hi, we set thinking mode as default to make it compatible with our official API behavior. The problem you describe is likely a bug in the reasoning parser. For example, sglang fixed a similar issue recently: https://github.com/sgl-project/sglang/pull/17901

It seems to be true. 🀯

QIN2DIM changed discussion status to closed

That's not problem. but suffix index seems wrong.

current chat_template includes:

{# split all messages into history & suffix, reasoning_content in suffix should be reserved.#}
{%- set hist_msgs = messages[:ns.last_non_tool_call_assistant_msg+1] -%}
{%- set suffix_msgs = messages[ns.last_non_tool_call_assistant_msg+1:] -%}

But it should be

{# split all messages into history & suffix, reasoning_content in suffix should be reserved.#}
{%- set hist_msgs = messages[:ns.last_non_tool_call_assistant_msg] -%}
{%- set suffix_msgs = messages[ns.last_non_tool_call_assistant_msg:] -%}

There is another related issue with the hist/suffix split: when all assistant messages have tool_calls (common in multi-turn tool-call conversations), last_non_tool_call_assistant_msg stays at -1, causing all messages to become suffix_msgs. This makes reasoning_content from every historical turn accumulate in the prompt, eventually causing the model to degenerate into repetitive output after ~10-18 rounds.

Fix: add a fallback after the existing loop β€” when no non-tool-call assistant is found, split at the last assistant message:

{%- if ns.last_non_tool_call_assistant_msg == -1 -%}
{%- for idx in range(messages|length-1, -1, -1) -%}
{%- if messages[idx]['role'] == 'assistant' -%}
{%- set ns.last_non_tool_call_assistant_msg = idx - 1 -%}
{%- break -%}
{%- endif -%}
{%- endfor -%}
{%- endif -%}
This ensures only the latest turn's reasoning_content is preserved, while older turns get (cleared), matching the template's intended behavior for hist_msgs.

Moonshot AI org

That's not problem. but suffix index seems wrong.

current chat_template includes:

{# split all messages into history & suffix, reasoning_content in suffix should be reserved.#}
{%- set hist_msgs = messages[:ns.last_non_tool_call_assistant_msg+1] -%}
{%- set suffix_msgs = messages[ns.last_non_tool_call_assistant_msg+1:] -%}

But it should be

{# split all messages into history & suffix, reasoning_content in suffix should be reserved.#}
{%- set hist_msgs = messages[:ns.last_non_tool_call_assistant_msg] -%}
{%- set suffix_msgs = messages[ns.last_non_tool_call_assistant_msg:] -%}

Could you clarify where the bug is?

  • An assistant message without tool calls marks the end of a conversation turn and should be part of the history messages.
Moonshot AI org

There is another related issue with the hist/suffix split: when all assistant messages have tool_calls (common in multi-turn tool-call conversations), last_non_tool_call_assistant_msg stays at -1, causing all messages to become suffix_msgs. This makes reasoning_content from every historical turn accumulate in the prompt, eventually causing the model to degenerate into repetitive output after ~10-18 rounds.

Fix: add a fallback after the existing loop β€” when no non-tool-call assistant is found, split at the last assistant message:

{%- if ns.last_non_tool_call_assistant_msg == -1 -%}
{%- for idx in range(messages|length-1, -1, -1) -%}
{%- if messages[idx]['role'] == 'assistant' -%}
{%- set ns.last_non_tool_call_assistant_msg = idx - 1 -%}
{%- break -%}
{%- endif -%}
{%- endfor -%}
{%- endif -%}
This ensures only the latest turn's reasoning_content is preserved, while older turns get (cleared), matching the template's intended behavior for hist_msgs.

When every assistant message contains tool calls, the multi-step conversation is continuing, and retaining all thinking content is the intended behavior by design.
I'm not sure if an excessive amount of thinking content in the prompt would degrade model performance (would it?), and I'm uncertain whether setting a maximum limit on the number of reserved thinking content entries would help. You're welcome to experiment with such limits, but preserving the complete thinking content for all tool calls remains the expected behavior.

Sign up or log in to comment