How to control reasoning effort? Heavy Mode/low-latency mode?

#10

by huggingfacemotnt - opened 24 days ago

24 days ago

Thanks for the great model!

Is there a way to control the reasoning effort, or reasoning length? I see "low-latency mode" and "Heavy Mode" referenced in the model card and the documentation, but it's unclear how to use it. Is it as simple as adding something like:

"Reasoning: low"

to the system prompt like with other models? I'm using either vllm or sglang if it matters.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment