File size: 9,460 Bytes
cdf0a7b
 
33a8c0c
cdf0a7b
 
 
 
 
 
 
 
 
 
 
ea7afd3
 
 
 
 
 
cdf0a7b
a032928
 
 
 
 
cdaa9fa
 
 
a032928
1767e56
f242030
4ae24fd
cdf0a7b
c5badcb
cdf0a7b
f242030
cdf0a7b
b6beb1a
b8e68a7
cdf0a7b
1767e56
cdf0a7b
4ae24fd
cdf0a7b
b8e68a7
cdf0a7b
4ae24fd
cdf0a7b
4ae24fd
 
cdaa9fa
4ae24fd
 
 
71b1ecb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4ae24fd
cdf0a7b
b8e68a7
cdf0a7b
b8e68a7
cdf0a7b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
017cb2b
cdf0a7b
 
 
017cb2b
 
cdf0a7b
 
b8e68a7
cdf0a7b
4ae24fd
cdf0a7b
71b1ecb
cdf0a7b
b6beb1a
5f8cc30
cdf0a7b
 
 
017cb2b
 
 
 
5f8cc30
cdf0a7b
 
 
33a8c0c
cdf0a7b
33a8c0c
cdf0a7b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
017cb2b
 
cdf0a7b
 
b8e68a7
cdf0a7b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
017cb2b
 
cdf0a7b
 
b8e68a7
f242030
a9f0283
cdf0a7b
b8e68a7
cdf0a7b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b8e68a7
cdf0a7b
ed21118
cdf0a7b
b8e68a7
cdf0a7b
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
---
license: apache-2.0  
base_model: microsoft/MiniLM-L6-v2  
tags:  
- transformers  
- sentence-transformers  
- sentence-similarity  
- feature-extraction  
- text-embeddings-inference  
- information-retrieval  
- knowledge-distillation  
language:
- en
---
<div style="display: flex; justify-content: center;">      
    <div style="display: flex; align-items: center; gap: 10px;">      
        <img src="logo.webp" alt="MongoDB Logo" style="height: 36px; width: auto; border-radius: 4px;">      
        <span style="font-size: 32px; font-weight: bold">MongoDB/mdbr-leaf-ir</span>      
    </div>      
</div>  

# Content

1. [Introduction](#introduction)
2. [Technical Report](#technical-report)
3. [Highlights](#highlights)
4. [Benchmarks](#benchmark-comparison)
5. [Quickstart](#quickstart)
6. [Citation](#citation)

# Introduction

`mdbr-leaf-ir` is a compact high-performance text embedding model specifically designed for **information retrieval (IR)** tasks, e.g., the retrieval component of Retrieval-Augmented Generation (RAG) pipelines. 

To enable even greater efficiency, `mdbr-leaf-ir` supports [flexible asymmetric architectures](#asymmetric-retrieval-setup) and is robust to [vector quantization](#vector-quantization) and [MRL truncation](#mrl-truncation).

If you are looking to perform other tasks such as classification, clustering, semantic sentence similarity, summarization, please check out our [`mdbr-leaf-mt`](https://huggingface.co/MongoDB/mdbr-leaf-mt) model.

> [!Note]  
> **Note**: this model has been developed by the ML team of MongoDB Research. At the time of writing it is not used in any of MongoDB's commercial product or service offerings.

# Technical Report

A technical report detailing our proposed `LEAF` training procedure will be available soon (link will be added here).

# Highlights  

* **State-of-the-Art Performance**: `mdbr-leaf-ir` achieves state-of-the-art results for compact embedding models, **ranking #1** on the public [BEIR benchmark leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for models with ≤100M parameters.
* **Flexible Architecture Support**: `mdbr-leaf-ir` supports asymmetric retrieval architectures enabling even greater retrieval results. [See below](#asymmetric-retrieval-setup) for more information.
* **MRL and Quantization Support**: embedding vectors generated by `mdbr-leaf-ir` compress well when truncated (MRL) and can be stored using more efficient types like `int8` and `binary`.  [See below](#mrl-truncation) for more information.

## Benchmark Comparison
  
The table below shows the average BEIR benchmark scores (nDCG@10) for `mdbr-leaf-ir` compared to other retrieval models.

`mdbr-leaf-ir` ranks #1 on the BEIR public leaderboard, and when run in asymmetric "**(asym.)**" mode as described [here](#asymmetric-retrieval-setup), the results improve even further.

| Model                              | Size    | BEIR  Avg. (nDCG@10) |  
|------------------------------------|---------|----------------------|
| OpenAI text-embedding-3-large      | Unknown | 55.43                |  
| **mdbr-leaf-ir (asym.)**           | 23M     | **54.03**            |  
| **mdbr-leaf-ir**                   | 23M     | **53.55**            |  
| snowflake-arctic-embed-s           | 32M     | 51.98                |  
| bge-small-en-v1.5                  | 33M     | 51.65                |  
| OpenAI text-embedding-3-small      | Unknown | 51.08                |  
| granite-embedding-small-english-r2 | 47M     | 50.87                |  
| snowflake-arctic-embed-xs          | 23M     | 50.15                |  
| e5-small-v2                        | 33M     | 49.04                |  
| SPLADE++                           | 110M    | 48.88                |  
| MiniLM-L6-v2                       | 23M     | 41.95                |  
| BM25                               | –       | 41.14                |  


# Quickstart  
  
## Sentence Transformers  
  
```python  
from sentence_transformers import SentenceTransformer  
  
# Load the model  
model = SentenceTransformer("MongoDB/mdbr-leaf-ir")  
  
# Example queries and documents  
queries = [
    "What is machine learning?",  
    "How does neural network training work?"  
]  
  
documents = [  
    "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.",  
    "Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors."  
]  
  
# Encode queries and documents  
query_embeddings = model.encode(queries, prompt_name="query")  
document_embeddings = model.encode(documents)  
  
# Compute similarity scores  
scores = model.similarity(query_embeddings, document_embeddings)  

# Print results
for i, query in enumerate(queries):
    print(f"Query: {query}")
    for j, doc in enumerate(documents):
        print(f" Similarity: {scores[i, j]:.4f} | Document {j}: {doc[:80]}...")

# Query: What is machine learning?
#  Similarity: 0.6857 | Document 0: Machine learning is a subset of ...
#  Similarity: 0.4598 | Document 1: Neural networks are trained ...
# 
# Query: How does neural network training work?
#  Similarity: 0.4238 | Document 0: Machine learning is a subset of ...
#  Similarity: 0.5723 | Document 1: Neural networks are trained ...
```
  
## Transformers Usage  

See full example notebook [here](https://huggingface.co/MongoDB/mdbr-leaf-ir/blob/main/transformers_example.ipynb).
  
## Asymmetric Retrieval Setup
  
`mdbr-leaf-ir` is *aligned* to [`snowflake-arctic-embed-m-v1.5`](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5), the model it has been distilled from. This enables flexible architectures in which, for example, documents are encoded using the larger model, while queries can be encoded faster and more efficiently with the compact `leaf` model:
```python
# Use mdbr-leaf-ir for query encoding (real-time, low latency)  
query_model = SentenceTransformer("MongoDB/mdbr-leaf-ir")  
query_embeddings = query_model.encode(queries, prompt_name="query")  

# Use a larger model for document encoding (one-time, at index time)  
doc_model = SentenceTransformer("Snowflake/snowflake-arctic-embed-m-v1.5")  
document_embeddings = doc_model.encode(documents)  

# Compute similarities  
scores = query_model.similarity(query_embeddings, document_embeddings)  
```
Retrieval results in asymmetric mode are often superior to the [standard mode above](#sentence-transformers).

## MRL Truncation

Embeddings have been trained via [MRL](https://arxiv.org/abs/2205.13147) and can be truncated for more efficient storage:
```python
from torch.nn import functional as F

query_embeds = model.encode(queries, prompt_name="query", convert_to_tensor=True)
doc_embeds = model.encode(documents, convert_to_tensor=True)

# Truncate and normalize according to MRL
query_embeds = F.normalize(query_embeds[:, :256], dim=-1)
doc_embeds = F.normalize(doc_embeds[:, :256], dim=-1)

similarities = model.similarity(query_embeds, doc_embeds)

print('After MRL:')
print(f"* Embeddings dimension: {query_embeds.shape[1]}")
print(f"* Similarities:\n\t{similarities}")

# After MRL:
# * Embeddings dimension: 256
# * Similarities:
# 	tensor([[0.7136, 0.4989],
#           [0.4567, 0.6022]])
```

## Vector Quantization
Vector quantization, for example to `int8` or `binary`, can be performed as follows:

**Note**: For vector quantization to types other than binary, we suggest performing a calibration to determine the optimal ranges, [see here](https://sbert.net/examples/sentence_transformer/applications/embedding-quantization/README.html#scalar-int8-quantization). 
Good initial values, according to the [teacher model's documentation](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5#compressing-to-128-bytes), are:
* `int8`: -0.3 and +0.3
* `int4`: -0.18 and +0.18 
```python
from sentence_transformers.quantization import quantize_embeddings
import torch

query_embeds = model.encode(queries, prompt_name="query")
doc_embeds = model.encode(documents)

# Quantize embeddings to int8 using -0.3 and +0.3 as calibration ranges
ranges = torch.tensor([[-0.3], [+0.3]]).expand(2, query_embeds.shape[1]).cpu().numpy()
query_embeds = quantize_embeddings(query_embeds, "int8", ranges=ranges)
doc_embeds = quantize_embeddings(doc_embeds, "int8", ranges=ranges)

# Calculate similarities; cast to int64 to avoid under/overflow
similarities = query_embeds.astype(int) @ doc_embeds.astype(int).T

print('After quantization:')
print(f"* Embeddings type: {query_embeds.dtype}")
print(f"* Similarities:\n{similarities}")

# After quantization:
# * Embeddings type: int8
# * Similarities:
#  [[118022  79111]
#   [ 72961  98333]]
```

# Evaluation

Please [see here](https://huggingface.co/MongoDB/mdbr-leaf-ir/blob/main/evaluate_models.ipynb).

# Citation
  
If you use this model in your work, please cite:  
  
```bibtex  
@article{mdb_leaf,  
  title         = {LEAF: Lightweight Embedding Alignment Knowledge Distillation Framework},  
  author        = {Robin Vujanic and Thomas Rueckstiess},  
  year          = {2025}
  eprint        = {TBD},
  archiveprefix = {arXiv},
  primaryclass  = {FILL HERE},
  url           = {FILL HERE}
}  
```  
  
# License  
  
This model is released under Apache 2.0 License.  
  
# Contact  
  
For questions or issues, please open an issue or pull request. You can also contact the MongoDB ML research team at [email protected].