Ouro-1.4B-Thinking
Model Description
⚠️ IMPORTANT: This model is intended for research purposes only. It is provided as-is without warranties for production use.
Ouro-1.4B-Thinking is a reasoning-specialized variant of the Ouro-1.4B base model, enhanced through supervised fine-tuning on high-quality reasoning data.
Key Features
- Advanced Reasoning: Specifically optimized for mathematical and scientific reasoning tasks
- Compact Size: Competitive with 4B models despite having only 1.4B parameters
- Cross-Step Consistency: Intermediate recurrent outputs can serve as reliable proxies for final answers
- Explicit Thinking Process: Trained to generate detailed reasoning steps
Configuration
Recurrent Steps and Adaptive Exit
The model's computational behavior can be configured through the config.json file:
{
"total_ut_steps": 4,
"early_exit_threshold": 1.0
}
total_ut_steps: Controls the number of recurrent steps (default: 4). You can adjust this value to trade off between performance and computation time.early_exit_threshold: Controls the adaptive exit mechanism (default: 1.0). Lower values encourage earlier exit, while 1.0 means always use all steps.
Example: Modify recurrent steps
from transformers import AutoConfig, AutoModelForCausalLM
config = AutoConfig.from_pretrained("ByteDance/Ouro-1.4B-Thinking")
config.total_ut_steps = 3 # Use 3 recurrent steps instead of 4
model = AutoModelForCausalLM.from_pretrained(
"ByteDance/Ouro-1.4B-Thinking",
config=config,
device_map="auto"
)
Note: vLLM does not currently support the adaptive exit feature due to its inference optimization characteristics. When using vLLM, the model will always execute the full number of
total_ut_steps.
Model Architecture
Based on Ouro-1.4B with additional reasoning fine-tuning:
| Configuration | Value |
|---|---|
| Parameters | 1.4B |
| Layers | 24 |
| Recurrent Steps | 4 |
| Hidden Size | 2048 |
| Attention Heads | Multi-Head Attention (MHA) |
| FFN Activation | SwiGLU |
| Position Embedding | RoPE |
| Vocabulary Size | 49,152 |
| Context Length | 32K (SFT) |
| Normalization | Sandwich RMSNorm |
Training Details
Pre-training
- Training Tokens: 7.7T tokens across 4 stages
- Base Architecture: Ouro-1.4B
Supervised Fine-Tuning
- Data Size: ~8.3M examples
- Data Composition:
- Mathematics: 3.5M examples (OpenThoughts3, AceReason-1.1-SFT)
- Code: 3.2M examples (AceReason, OpenCodeReasoning, Llama-Nemotron, OpenThoughts3)
- Science: 808K examples (OpenThoughts3, Llama-Nemotron)
- Chat: 767K examples (DeepWriting-20K)
- Training: 2 epochs, max sequence length 32K
- Optimizer: Adam (lr=2×10⁻⁵, β=(0.9, 0.95))
- Scheduler: Cosine decay
Quick Start
⚠️ IMPORTANT: Please use transformers<4.56.0 to avoid compatibility issues. We recommend transformers==4.54.1 or earlier versions.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Bytedance/Ouro-1.4B-Thinking"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype="auto"
)
# Generate with reasoning
messages = [
{"role": "user", "content": "Solve: If 2x + 3 = 11, what is x?"}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, temperature=1.0, top_p=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Citation
@article{zhu2025scaling,
title={Scaling Latent Reasoning via Looped Language Models},
author={Zhu, Rui-Jie and Wang, Zixuan and Hua, Kai and Zhang, Tianyu and Li, Ziniu and Que, Haoran and Boyi Wei and Zixin Wen and Fan Yin and He Xing and Lu Li and Jiajun Shi and Kaijing Ma and Shanda Li and Taylor Kergan and Andrew Smith and Xingwei Qu and Mude Hui and Bohong Wu and Qiyang Min and Hongzhi Huang and Xun Zhou and Wei Ye and Jiaheng Liu and Jian Yang and Yunfeng Shi and Chenghua Lin and Enduo Zhao and Tianle Cai and Ge Zhang and Wenhao Huang and Yoshua Bengio and Jason Eshraghian},
journal={arXiv preprint arXiv:2510.25741},
year={2025},
url={https://arxiv.org/abs/2510.25741},
}
License
This model is licensed under Apache-2.0. See the LICENSE file for details.
Project Links
- Paper: Scaling Latent Reasoning via Looped Language Models
- Code: https://github.com/Ouro-LLM/Ouro
- Project Page: https://ouro-llm.github.io
- Downloads last month
- 219

