Ouro-1.4B-Thinking

Model Description

⚠️ IMPORTANT: This model is intended for research purposes only. It is provided as-is without warranties for production use.

Ouro-1.4B-Thinking is a reasoning-specialized variant of the Ouro-1.4B base model, enhanced through supervised fine-tuning on high-quality reasoning data.

Key Features

Advanced Reasoning: Specifically optimized for mathematical and scientific reasoning tasks
Compact Size: Competitive with 4B models despite having only 1.4B parameters
Cross-Step Consistency: Intermediate recurrent outputs can serve as reliable proxies for final answers
Explicit Thinking Process: Trained to generate detailed reasoning steps

Configuration

Recurrent Steps and Adaptive Exit

The model's computational behavior can be configured through the config.json file:

{
  "total_ut_steps": 4,
  "early_exit_threshold": 1.0
}

total_ut_steps: Controls the number of recurrent steps (default: 4). You can adjust this value to trade off between performance and computation time.
early_exit_threshold: Controls the adaptive exit mechanism (default: 1.0). Lower values encourage earlier exit, while 1.0 means always use all steps.

Example: Modify recurrent steps

from transformers import AutoConfig, AutoModelForCausalLM

config = AutoConfig.from_pretrained("ByteDance/Ouro-1.4B-Thinking")
config.total_ut_steps = 3  # Use 3 recurrent steps instead of 4
model = AutoModelForCausalLM.from_pretrained(
    "ByteDance/Ouro-1.4B-Thinking",
    config=config,
    device_map="auto"
)

Note: vLLM does not currently support the adaptive exit feature due to its inference optimization characteristics. When using vLLM, the model will always execute the full number of total_ut_steps.

Model Architecture

Based on Ouro-1.4B with additional reasoning fine-tuning:

Configuration	Value
Parameters	1.4B
Layers	24
Recurrent Steps	4
Hidden Size	2048
Attention Heads	Multi-Head Attention (MHA)
FFN Activation	SwiGLU
Position Embedding	RoPE
Vocabulary Size	49,152
Context Length	32K (SFT)
Normalization	Sandwich RMSNorm

Training Details

Pre-training

Training Tokens: 7.7T tokens across 4 stages
Base Architecture: Ouro-1.4B

Supervised Fine-Tuning

Data Size: ~8.3M examples
Data Composition:
- Mathematics: 3.5M examples (OpenThoughts3, AceReason-1.1-SFT)
- Code: 3.2M examples (AceReason, OpenCodeReasoning, Llama-Nemotron, OpenThoughts3)
- Science: 808K examples (OpenThoughts3, Llama-Nemotron)
- Chat: 767K examples (DeepWriting-20K)
Training: 2 epochs, max sequence length 32K
Optimizer: Adam (lr=2×10⁻⁵, β=(0.9, 0.95))
Scheduler: Cosine decay

Quick Start

⚠️ IMPORTANT: Please use transformers<4.56.0 to avoid compatibility issues. We recommend transformers==4.54.1 or earlier versions.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Bytedance/Ouro-1.4B-Thinking"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype="auto"
)

# Generate with reasoning
messages = [
    {"role": "user", "content": "Solve: If 2x + 3 = 11, what is x?"}
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(inputs, max_new_tokens=512, temperature=1.0, top_p=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

@article{zhu2025scaling,
  title={Scaling Latent Reasoning via Looped Language Models},
  author={Zhu, Rui-Jie and Wang, Zixuan and Hua, Kai and Zhang, Tianyu and Li, Ziniu and Que, Haoran and Boyi Wei and Zixin Wen and Fan Yin and He Xing and Lu Li and Jiajun Shi and Kaijing Ma and Shanda Li and Taylor Kergan and Andrew Smith and Xingwei Qu and Mude Hui and Bohong Wu and Qiyang Min and Hongzhi Huang and Xun Zhou and Wei Ye and Jiaheng Liu and Jian Yang and Yunfeng Shi and Chenghua Lin and Enduo Zhao and Tianle Cai and Ge Zhang and Wenhao Huang and Yoshua Bengio and Jason Eshraghian},
  journal={arXiv preprint arXiv:2510.25741},
  year={2025},
  url={https://arxiv.org/abs/2510.25741},
}

License

This model is licensed under Apache-2.0. See the LICENSE file for details.

Project Links

Paper: Scaling Latent Reasoning via Looped Language Models
Code: https://github.com/Ouro-LLM/Ouro
Project Page: https://ouro-llm.github.io

Downloads last month: 219

Collection including ByteDance/Ouro-1.4B-Thinking

Ouro

Collection

a family of pre-trained Looped Language Models. • 4 items • Updated 7 days ago • 10