rvo commited on
Commit
4ae24fd
·
verified ·
1 Parent(s): 26ef51d

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -5
README.md CHANGED
@@ -29,7 +29,7 @@ language:
29
 
30
  # Introduction
31
 
32
- `mdbr-leaf-ir` is a compact high-performance text embedding model specifically designed for **information retrieval (IR)** tasks, e.g., the retrieval part of RAGs.
33
 
34
  To enable even greater efficiency, `mdbr-leaf-ir` supports [flexible asymmetric architectures](#asymmetric-retrieval-setup) and is robust to [vector quantization](#vector-quantization) and [MRL truncation](#mrl-truncation).
35
 
@@ -40,13 +40,32 @@ If you are looking to perform other tasks such as classification, clustering, se
40
 
41
  # Technical Report
42
 
43
- A technical report detailing our proposed `LEAF` training procedure will be available soon.
44
 
45
  # Highlights
46
 
47
- * **State-of-the-Art Performance**: `mdbr-leaf-ir` achieves new state-of-the-art results for compact embedding models, **ranking #1** on the public [BEIR benchmark leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for models with ≤100M parameters.
48
  * **Flexible Architecture Support**: `mdbr-leaf-ir` supports asymmetric retrieval architectures enabling even greater retrieval results. [See below](#asymmetric-retrieval-setup) for more information.
49
- * **MRL and Quantization Support**: embedding vectors generated by `mdbr-leaf-ir` compress well when truncated (MRL) and can be stored using more efficient types like `int8` and `binary`. [See below](#mrl) for more information.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
  # Quickstart
52
 
@@ -93,7 +112,7 @@ for i, query in enumerate(queries):
93
 
94
  ## Transformers Usage
95
 
96
- See [here](https://huggingface.co/MongoDB/mdbr-leaf-ir/blob/main/transformers_example.ipynb).
97
 
98
  ## Asymmetric Retrieval Setup
99
 
 
29
 
30
  # Introduction
31
 
32
+ `mdbr-leaf-ir` is a compact high-performance text embedding model specifically designed for **information retrieval (IR)** tasks, e.g., the retrieval component of Retrieval-Augmented Generation (RAG) pipelines.
33
 
34
  To enable even greater efficiency, `mdbr-leaf-ir` supports [flexible asymmetric architectures](#asymmetric-retrieval-setup) and is robust to [vector quantization](#vector-quantization) and [MRL truncation](#mrl-truncation).
35
 
 
40
 
41
  # Technical Report
42
 
43
+ A technical report detailing our proposed `LEAF` training procedure will be available soon (link will be added here).
44
 
45
  # Highlights
46
 
47
+ * **State-of-the-Art Performance**: `mdbr-leaf-ir` achieves state-of-the-art results for compact embedding models, **ranking #1** on the public [BEIR benchmark leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for models with ≤100M parameters.
48
  * **Flexible Architecture Support**: `mdbr-leaf-ir` supports asymmetric retrieval architectures enabling even greater retrieval results. [See below](#asymmetric-retrieval-setup) for more information.
49
+ * **MRL and Quantization Support**: embedding vectors generated by `mdbr-leaf-ir` compress well when truncated (MRL) and can be stored using more efficient types like `int8` and `binary`. [See below](#mrl-truncation) for more information.
50
+
51
+ ## Benchmark Comparison
52
+
53
+ The table below shows the average BEIR benchmark scores (nDCG@10) for `mdbr-leaf-ir` compared to other retrieval models.
54
+
55
+ | Model | Size | BEIR Avg. (nDCG@10) |
56
+ |------------------------------------|------|----------------------|
57
+ | **mdbr-leaf-ir** | 23M | **53.55** |
58
+ | snowflake-arctic-embed-s | 32M | 51.98 |
59
+ | bge-small-en-v1.5 | 33M | 51.65 |
60
+ | granite-embedding-small-english-r2 | 47M | 50.87 |
61
+ | snowflake-arctic-embed-xs | 23M | 50.15 |
62
+ | e5-small-v2 | 33M | 49.04 |
63
+ | SPLADE++ | 110M | 48.88 |
64
+ | MiniLM-L6-v2 | 23M | 41.95 |
65
+ | BM25 | – | 41.14 |
66
+
67
+ [//]: # (| **mdbr-leaf-ir (asym.)** | 23M | **?** | )
68
+
69
 
70
  # Quickstart
71
 
 
112
 
113
  ## Transformers Usage
114
 
115
+ See full example notebook [here](https://huggingface.co/MongoDB/mdbr-leaf-ir/blob/main/transformers_example.ipynb).
116
 
117
  ## Asymmetric Retrieval Setup
118