Search-R3: Reasoning-Reinforced Representation for Search (paper)

Code: https://github.com/ytgui/Search-R3

Welcome to Search-R3, our cool approach to making LLMs way better at search tasks by teaching them to embed after they think.

What's This All About?

We've figured out a neat trick: when language models reason step-by-step and then create embeddings based on that reasoning, they get much better at understanding what you're really searching for.

Different to BERT-based embedding models that use a fixed <cls> token for representation, this model creates embeddings by generating a special <embed_token> token during chat conversations.

Getting Started

The model is basically an auto-regressive language model. The easiest way to test it out is with a simple chat - just ask "Who are you?" and see how it responds.

from pprint import pprint
from transformers import pipeline

generator = pipeline(
    task='text-generation',
    model='ytgui/Search-R3.0-Small',
    torch_dtype='auto', device_map='auto'
)
messages = [
    {"role": 'user', 'content': 'Who are you?'},
]
response = generator(messages, max_new_tokens=256)
pprint(response)

Output:

I'm Qwen, an AI assistant created by Alibaba Cloud.
I'm here to help you with any questions you have. If you have any concerns or need assistance, feel free to ask!

Note: we use Qwen-2.5-Instruct as the base model, we didn't change the chat template.

Show Me Something Cool

Here's a more complete example.py showing how to use Search-R3 to find relevant documents:

# 1. using the warped SearchR3 class
model = SearchR3(
    'ytgui/Search-R3.0-Small', max_length=1024, batch_size=1
)

# 2. generating the reasoning path w.r.t. the user query
reasoning = model.generate(
    batch=['what python library is useful for data analysis?']
)
pprint(reasoning)

# 3. embedding and compare distances to pre-encoded documents
documents = [
        'pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.',
        'The giant panda (Ailuropoda melanoleuca), also known as the panda bear or simply panda, is a bear species endemic to China. It is characterised by its white coat with black patches around the eyes, ears, legs and shoulders.',
]
E_d = model.encode(batch=documents)
E_q = model.encode(batch=reasoning)
print('distance:', torch.cdist(E_q, E_d, p=2.0))

Output:

what python library is useful for data analysis?

**Key Terminology**:
- **Pandas**: For data manipulation and analysis, offering DataFrame structures and operations.
- **NumPy**: For numerical operations, providing arrays and mathematical functions.
- **Matplotlib**: For data visualization, offering plots and charts.
- **Scikit-learn**: For machine learning algorithms and model evaluation.
The final embedding is: <|embed_token|>

distance: tensor([[0.8282, 1.2895]])

And that's it! Search-R3 creates powerful embeddings that capture the essence of content better than traditional embedding models - because it thinks through the meaning first, then creates the embedding. This makes it especially good at handling complex questions and nuanced content.

Limitations

Our model is currently in preview phase for academic purposes only. It's not yet stable or reliable enough for industrial or commercial applications. There are a few things to keep in mind when using Search-R3:

  • Model size: we built this on a 1.5B parameter instruction model and applied RL directly without distillation. Because of its relatively small size, the model can sometimes struggle to follow instructions consistently. If you try different prompts than the ones we provide, the model might not generate embedding tokens reliably.

  • Sequence length: we didn't train the model on very long sequences. Performance can degrade when you feed it longer texts - so if you're benchmarking against us, that's one way you might be able to beat our scores!

Citation

If you use Search-R3 in your research or applications, please cite our paper:

@misc{gui2025searchr3,
      title={Search-R3: Unifying Reasoning and Embedding Generation in Large Language Models}, 
      author={Yuntao Gui and James Cheng},
      year={2025},
      eprint={2510.07048},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.07048}, 
}
Downloads last month
44
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ytgui/Search-R3.0-Small

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(1232)
this model