Upload README.md
Browse files
README.md
CHANGED
|
@@ -20,11 +20,13 @@ language:
|
|
| 20 |
</div>
|
| 21 |
</div>
|
| 22 |
|
|
|
|
|
|
|
| 23 |
**mdbr-leaf-ir** is a compact high-performance text embedding model specifically designed for **information retrieval (IR)** tasks.
|
| 24 |
|
| 25 |
Enabling even greater efficiency, `mdbr-leaf-ir` supports [flexible asymmetric architectures](#asymmetric-retrieval-setup) and is robust to [vector quantization](#vector-quantization) and [MRL truncation](#mrl).
|
| 26 |
|
| 27 |
-
If you are looking to perform other tasks such as classification, clustering, semantic sentence similarity, summarization, please check out our [`
|
| 28 |
|
| 29 |
**Note**: this model has been developed by MongoDB Research and is not part of MongoDB's commercial offerings.
|
| 30 |
|
|
@@ -39,8 +41,7 @@ A technical report detailing our proposed `LEAF` training procedure is [availabl
|
|
| 39 |
* **MRL and quantization support**: embedding vectors generated by `mdbr-leaf-ir` compress well when truncated (MRL) and/or are stored using more efficient types like `int8` and `binary`. [See below](#mrl) for more information.
|
| 40 |
|
| 41 |
|
| 42 |
-
## Performance
|
| 43 |
-
|
| 44 |
### Benchmark Results
|
| 45 |
|
| 46 |
* Values are nDCG@10
|
|
@@ -58,7 +59,8 @@ A technical report detailing our proposed `LEAF` training procedure is [availabl
|
|
| 58 |
| `BM25` | -- | 40.8 | 23.8 | 31.8 | 15.0 | 67.6 | 78.7 | 58.9 | 30.5 | 63.8 | 16.2 | 31.9 | 62.9 | 43.5 |
|
| 59 |
| `SPLADE v2` | 110M | 47.9 | 33.6 | 33.4 | 15.8 | 69.3 | 83.8 | 71.0 | 52.1 | 78.6 | 23.5 | 43.5 | **68.4** | 51.7 |
|
| 60 |
| `ColBERT v2` | 110M | 46.3 | 35.6 | 33.8 | 15.4 | 69.3 | 85.2 | 73.8 | 56.2 | 78.5 | 17.6 | **44.6** | 66.7 | 51.9 |
|
| 61 |
-
|
|
|
|
| 62 |
## Quickstart
|
| 63 |
|
| 64 |
### Sentence Transformers
|
|
@@ -250,6 +252,10 @@ print(f"* Similarities:\n{similarities}")
|
|
| 250 |
# [ 76174 99127]]
|
| 251 |
```
|
| 252 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 253 |
|
| 254 |
## Citation
|
| 255 |
|
|
|
|
| 20 |
</div>
|
| 21 |
</div>
|
| 22 |
|
| 23 |
+
## Introduction
|
| 24 |
+
|
| 25 |
**mdbr-leaf-ir** is a compact high-performance text embedding model specifically designed for **information retrieval (IR)** tasks.
|
| 26 |
|
| 27 |
Enabling even greater efficiency, `mdbr-leaf-ir` supports [flexible asymmetric architectures](#asymmetric-retrieval-setup) and is robust to [vector quantization](#vector-quantization) and [MRL truncation](#mrl).
|
| 28 |
|
| 29 |
+
If you are looking to perform other tasks such as classification, clustering, semantic sentence similarity, summarization, please check out our [`mdbr-leaf-mt`](https://huggingface.co/MongoDB/mdbr-leaf-mt) model.
|
| 30 |
|
| 31 |
**Note**: this model has been developed by MongoDB Research and is not part of MongoDB's commercial offerings.
|
| 32 |
|
|
|
|
| 41 |
* **MRL and quantization support**: embedding vectors generated by `mdbr-leaf-ir` compress well when truncated (MRL) and/or are stored using more efficient types like `int8` and `binary`. [See below](#mrl) for more information.
|
| 42 |
|
| 43 |
|
| 44 |
+
<!-- ## Performance
|
|
|
|
| 45 |
### Benchmark Results
|
| 46 |
|
| 47 |
* Values are nDCG@10
|
|
|
|
| 59 |
| `BM25` | -- | 40.8 | 23.8 | 31.8 | 15.0 | 67.6 | 78.7 | 58.9 | 30.5 | 63.8 | 16.2 | 31.9 | 62.9 | 43.5 |
|
| 60 |
| `SPLADE v2` | 110M | 47.9 | 33.6 | 33.4 | 15.8 | 69.3 | 83.8 | 71.0 | 52.1 | 78.6 | 23.5 | 43.5 | **68.4** | 51.7 |
|
| 61 |
| `ColBERT v2` | 110M | 46.3 | 35.6 | 33.8 | 15.4 | 69.3 | 85.2 | 73.8 | 56.2 | 78.5 | 17.6 | **44.6** | 66.7 | 51.9 |
|
| 62 |
+
-->
|
| 63 |
+
|
| 64 |
## Quickstart
|
| 65 |
|
| 66 |
### Sentence Transformers
|
|
|
|
| 252 |
# [ 76174 99127]]
|
| 253 |
```
|
| 254 |
|
| 255 |
+
## Evaluation
|
| 256 |
+
|
| 257 |
+
Please refer to this <span style="color:red">TBD</span> script to replicate results (standard and asymmetric mode).
|
| 258 |
+
The checkpoint used to produce the scores presented in the paper [is here](https://huggingface.co/MongoDB/mdbr-leaf-ir/commit/ea98995e96beac21b820aa8ad9afaa6fd29b243d).
|
| 259 |
|
| 260 |
## Citation
|
| 261 |
|