How to control reasoning effort? Heavy Mode/low-latency mode?

#10
by huggingfacemotnt - opened

Thanks for the great model!

Is there a way to control the reasoning effort, or reasoning length? I see "low-latency mode" and "Heavy Mode" referenced in the model card and the documentation, but it's unclear how to use it. Is it as simple as adding something like:

"Reasoning: low"

to the system prompt like with other models? I'm using either vllm or sglang if it matters.

Sign up or log in to comment