Text Generation
Transformers
Safetensors
qwen2
conversational
text-generation-inference
File size: 4,576 Bytes
d1dc746
7a536f1
 
d1dc746
 
5fa440a
d1dc746
7a536f1
 
67cb852
 
 
 
d694cb3
 
c34c51c
 
 
 
 
 
bdb0c41
c34c51c
8aa1a79
 
2341244
053bb25
33c155f
 
 
 
c34c51c
 
 
 
 
 
 
 
 
 
8d74475
c34c51c
 
 
 
 
7a536f1
 
 
 
c34c51c
 
94abac8
67cb852
d0d283b
 
 
 
7a536f1
67cb852
8ce016a
 
 
 
 
3708904
 
67cb852
 
 
 
 
 
 
 
 
 
 
 
7a536f1
 
67cb852
3708904
 
 
 
67cb852
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3708904
67cb852
 
3708904
 
7a536f1
 
3708904
 
 
 
 
 
 
4800131
 
 
 
 
 
 
5fa440a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
---
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
datasets:
- open-thoughts/OpenThoughts2-1M
- Vinnnf/Hybrid-OpenThoughts2-1M-1.5B
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
---

# Thinkless: LLM Learns When to Think

![image/png](https://cdn-uploads.huggingface.co/production/uploads/646a1939c37ca1e12308fe81/SRxJKkSuC0y-oMB7SFeR6.png)

<table>
  <thead>
  </thead>
  <tbody>
    <tr>
      <td>📄 <strong>Paper Link</strong></td>
      <td><a href="https://arxiv.org/abs/2505.13379">ArXiv</a></td>
    </tr>
    <tr>
      <td>💻 <strong>RL Code</strong></td>
      <td><a href="https://github.com/VainF/Thinkless">VainF/Thinkless</a></td>
    </tr>
    <tr>
      <td>💻 <strong>SFT Code</strong></td>
      <td><a href="https://github.com/VainF/Reasoning-SFT">VainF/Reasoning-SFT</a></td>
    </tr>
    <tr>
      <td>🤖 <strong>RL Model</strong></td>
      <td><a href="https://huggingface.co/Vinnnf/Thinkless-1.5B-RL-DeepScaleR">Thinkless-1.5B-RL-DeepScaleR</a></td>
    </tr>
    <tr>
      <td>🐣 <strong>Warmup Model</strong></td>
      <td><a href="https://huggingface.co/Vinnnf/Thinkless-1.5B-Warmup">Thinkless-1.5B-Warmup</a></td>
    </tr>
    <tr>
      <td>📊 <strong>Data for Warmup</strong></td>
      <td><a href="https://huggingface.co/datasets/Vinnnf/Hybrid-OpenThoughts2-1M-1.5B">Hybrid-OpenThoughts2-1M-1.5B</a></td>
    </tr>
    <tr>
      <td>📊 <strong>Data for RL</strong></td>
      <td><a href="https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset">agentica-org/DeepScaleR-Preview-Dataset</a></td>
    </tr>
    <tr>
      <td> 🌐 <strong>Project Page</strong></td>
      <td><a href="https://sites.google.com/view/eagle-llm">Thinkless Website</a></td>
    </tr>
  </tbody>
</table>

## Introduction

> [!NOTE] 
> ***Can LLMs learn when to think?***

We propose Thinkless, a learnable framework that empowers an LLM to adaptively select between short-form and long-form reasoning based on both task complexity and the model's ability. Thinkless is trained under a reinforcement learning paradigm and employs two control tokens, \<short\> for concise responses and \<think\> for detailed reasoning. At the core of our method is a Decoupled Group Relative Policy Optimization (DeGRPO) algorithm, which decomposes the learning objective of hybrid reasoning into two components: (1) a control token loss that governs the selection of the reasoning mode, and (2) a response loss that improves the accuracy of the generated answers. This decoupled formulation enables fine-grained control over the contributions of each objective, stabilizing training and effectively preventing collapse observed in vanilla GRPO. Empirically, on several benchmarks such as Minerva Algebra, MATH-500, and GSM8K, Thinkless is able to reduce the usage of long-chain thinking by 50% - 90%, significantly reducing the computational cost of Reasoning Language Models.


## Pipeline

![image/png](https://cdn-uploads.huggingface.co/production/uploads/646a1939c37ca1e12308fe81/3mx8EJUyOvCtxPnYTcwbS.png)

## QuickStart
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Vinnnf/Thinkless-1.5B-Warmup"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

instruction = "Please reason step by step, and put your final answer within \\boxed{}."
prompt = f"{instruction}
The arithmetic mean of 7, 2, $x$ and 10 is 9. What is the value of $x$?"

messages = [
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

think_mode = True
if think_mode:
    text = f"{text}<think>"
else:
    text = f"{text}<short>"

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=4096
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
num_tokens = len(generated_ids[0])

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(text+response)
print(f"
Think Mode: {think_mode}")
print(f"Number of tokens: {num_tokens}")
```


## Citation
If you find this work helpful, please cite:
```
@article{fang2025thinkless,
  title={Thinkless: LLM Learns When to Think},
  author={Fang, Gongfan and Ma, Xinyin and Wang, Xinchao},
  journal={arXiv preprint arXiv:2505.13379},
  year={2025}
}

```