File size: 4,576 Bytes
d1dc746 7a536f1 d1dc746 5fa440a d1dc746 7a536f1 67cb852 d694cb3 c34c51c bdb0c41 c34c51c 8aa1a79 2341244 053bb25 33c155f c34c51c 8d74475 c34c51c 7a536f1 c34c51c 94abac8 67cb852 d0d283b 7a536f1 67cb852 8ce016a 3708904 67cb852 7a536f1 67cb852 3708904 67cb852 3708904 67cb852 3708904 7a536f1 3708904 4800131 5fa440a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
---
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
datasets:
- open-thoughts/OpenThoughts2-1M
- Vinnnf/Hybrid-OpenThoughts2-1M-1.5B
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
---
# Thinkless: LLM Learns When to Think

<table>
<thead>
</thead>
<tbody>
<tr>
<td>📄 <strong>Paper Link</strong></td>
<td><a href="https://arxiv.org/abs/2505.13379">ArXiv</a></td>
</tr>
<tr>
<td>💻 <strong>RL Code</strong></td>
<td><a href="https://github.com/VainF/Thinkless">VainF/Thinkless</a></td>
</tr>
<tr>
<td>💻 <strong>SFT Code</strong></td>
<td><a href="https://github.com/VainF/Reasoning-SFT">VainF/Reasoning-SFT</a></td>
</tr>
<tr>
<td>🤖 <strong>RL Model</strong></td>
<td><a href="https://huggingface.co/Vinnnf/Thinkless-1.5B-RL-DeepScaleR">Thinkless-1.5B-RL-DeepScaleR</a></td>
</tr>
<tr>
<td>🐣 <strong>Warmup Model</strong></td>
<td><a href="https://huggingface.co/Vinnnf/Thinkless-1.5B-Warmup">Thinkless-1.5B-Warmup</a></td>
</tr>
<tr>
<td>📊 <strong>Data for Warmup</strong></td>
<td><a href="https://huggingface.co/datasets/Vinnnf/Hybrid-OpenThoughts2-1M-1.5B">Hybrid-OpenThoughts2-1M-1.5B</a></td>
</tr>
<tr>
<td>📊 <strong>Data for RL</strong></td>
<td><a href="https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset">agentica-org/DeepScaleR-Preview-Dataset</a></td>
</tr>
<tr>
<td> 🌐 <strong>Project Page</strong></td>
<td><a href="https://sites.google.com/view/eagle-llm">Thinkless Website</a></td>
</tr>
</tbody>
</table>
## Introduction
> [!NOTE]
> ***Can LLMs learn when to think?***
We propose Thinkless, a learnable framework that empowers an LLM to adaptively select between short-form and long-form reasoning based on both task complexity and the model's ability. Thinkless is trained under a reinforcement learning paradigm and employs two control tokens, \<short\> for concise responses and \<think\> for detailed reasoning. At the core of our method is a Decoupled Group Relative Policy Optimization (DeGRPO) algorithm, which decomposes the learning objective of hybrid reasoning into two components: (1) a control token loss that governs the selection of the reasoning mode, and (2) a response loss that improves the accuracy of the generated answers. This decoupled formulation enables fine-grained control over the contributions of each objective, stabilizing training and effectively preventing collapse observed in vanilla GRPO. Empirically, on several benchmarks such as Minerva Algebra, MATH-500, and GSM8K, Thinkless is able to reduce the usage of long-chain thinking by 50% - 90%, significantly reducing the computational cost of Reasoning Language Models.
## Pipeline

## QuickStart
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Vinnnf/Thinkless-1.5B-Warmup"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
instruction = "Please reason step by step, and put your final answer within \\boxed{}."
prompt = f"{instruction}
The arithmetic mean of 7, 2, $x$ and 10 is 9. What is the value of $x$?"
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
think_mode = True
if think_mode:
text = f"{text}<think>"
else:
text = f"{text}<short>"
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=4096
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
num_tokens = len(generated_ids[0])
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(text+response)
print(f"
Think Mode: {think_mode}")
print(f"Number of tokens: {num_tokens}")
```
## Citation
If you find this work helpful, please cite:
```
@article{fang2025thinkless,
title={Thinkless: LLM Learns When to Think},
author={Fang, Gongfan and Ma, Xinyin and Wang, Xinchao},
journal={arXiv preprint arXiv:2505.13379},
year={2025}
}
``` |