How to control reasoning effort? Heavy Mode/low-latency mode?
#10
by
huggingfacemotnt
- opened
Thanks for the great model!
Is there a way to control the reasoning effort, or reasoning length? I see "low-latency mode" and "Heavy Mode" referenced in the model card and the documentation, but it's unclear how to use it. Is it as simple as adding something like:
"Reasoning: low"
to the system prompt like with other models? I'm using either vllm or sglang if it matters.