mdbr-leaf-ir / README.md

rvo

Upload 2 files

04d5569 verified 4 months ago

preview code

raw

history blame

12.1 kB

metadata

license: apache-2.0
base_model:
  - microsoft/MiniLM-L6-v2
tags:
  - transformers
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - text-embeddings-inference
  - information-retrieval
  - knowledge-distillation
language:
  - en

MongoDB/mdbr-leaf-ir

Introduction

mdbr-leaf-ir is a compact high-performance text embedding model specifically designed for information retrieval (IR) tasks.

Enabling even greater efficiency, mdbr-leaf-ir supports flexible asymmetric architectures and is robust to vector quantization and MRL truncation.

If you are looking to perform other tasks such as classification, clustering, semantic sentence similarity, summarization, please check out our mdbr-leaf-mt model.

Note: this model has been developed by MongoDB Research and is not part of MongoDB's commercial offerings.

Technical Report

A technical report detailing our proposed LEAF training procedure is available here (TBD).

Highlights

State-of-the-Art Performance: mdbr-leaf-ir achieves new state-of-the-art results for compact embedding models, ranking #TBD on the public BEIR benchmark leaderboard for models <30M parameters with an average nDCG@10 score of [TBD HERE].
Flexible Architecture Support: mdbr-leaf-ir supports asymmetric retrieval architectures enabling even greater retrieval results. See below for more information.
MRL and quantization support: embedding vectors generated by mdbr-leaf-ir compress well when truncated (MRL) and/or are stored using more efficient types like int8 and binary. See below for more information.

Quickstart

Sentence Transformers

from sentence_transformers import SentenceTransformer  
  
# Load the model  
model = SentenceTransformer("MongoDB/mdbr-leaf-ir")  
  
# Example queries and documents  
queries = [
    "What is machine learning?",  
    "How does neural network training work?"  
]  
  
documents = [  
    "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.",  
    "Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors."  
]  
  
# Encode queries and documents  
query_embeddings = model.encode(queries, prompt_name="query")  
document_embeddings = model.encode(documents)  
  
# Compute similarity scores  
scores = model.similarity(query_embeddings, document_embeddings)  

# Print results
for i, query in enumerate(queries):
    print(f"Query: {query}")
    for j, doc in enumerate(documents):
        print(f" Similarity: {scores[i, j]:.4f} | Document {j}: {doc[:80]}...")

# Query: What is machine learning?
#  Similarity: 0.6908 | Document 0: Machine learning is a subset of ...
#  Similarity: 0.4598 | Document 1: Neural networks are trained ...
# 
# Query: How does neural network training work?
#  Similarity: 0.4432 | Document 0: Machine learning is a subset of ...
#  Similarity: 0.5794 | Document 1: Neural networks are trained ...

Transformers Usage

CHECK THAT safe_open WORKS WITH URLS; link to code in repo

Asymmetric Retrieval Setup

mdbr-leaf-ir is aligned to snowflake-arctic-embed-m-v1.5, the model it has been distilled from, making the asymmetric system below possible:

# Use a larger model for document encoding (one-time, at index time)  
doc_model = SentenceTransformer("Snowflake/snowflake-arctic-embed-m-v1.5")  
document_embeddings = doc_model.encode(documents)  
  
# Use mdbr-leaf-ir for query encoding (real-time, low latency)  
query_model = SentenceTransformer("MongoDB/mdbr-leaf-ir")  
query_embeddings = query_model.encode(queries, prompt_name="query")  
  
# Compute similarities  
scores = query_model.similarity(query_embeddings, document_embeddings)

Retrieval results from asymmetric mode are usually superior to the standard mode above.

MRL

Embeddings have been trained via MRL and can be truncated for more efficient storage:

from torch.nn import functional as F

query_embeds = model.encode(queries, prompt_name="query", convert_to_tensor=True)
doc_embeds = model.encode(documents, convert_to_tensor=True)

# Truncate and normalize according to MRL
query_embeds = F.normalize(query_embeds[:, :256], dim=-1)
doc_embeds = F.normalize(doc_embeds[:, :256], dim=-1)

similarities = model.similarity(query_embeds, doc_embeds)

print('After MRL:')
print(f"* Embeddings dimension: {query_embeds.shape[1]}")
print(f"* Similarities:\n\t{similarities}")

# After MRL:
# * Embeddings dimension: 256
# * Similarities:
# 	tensor([[0.7202, 0.5006],
#           [0.4744, 0.6083]])

Vector Quantization

Vector quantization, for example to int8 or binary, can be performed as follows:

Note: For vector quantization to types other than binary, we suggest performing a calibration to determine the optimal ranges, see here. Good initial values, according to the teacher model's documentation, are:

int8: -0.3 and +0.3
int4: -0.18 and +0.18

from sentence_transformers.quantization import quantize_embeddings
import torch

query_embeds = model.encode(queries, prompt_name="query")
doc_embeds = model.encode(documents)

# Quantize embeddings to int8 using -0.3 and +0.3 as calibration ranges
ranges = torch.tensor([[-0.3], [+0.3]]).expand(2, query_embeds.shape[1]).cpu().numpy()
query_embeds = quantize_embeddings(query_embeds, "int8", ranges=ranges)
doc_embeds = quantize_embeddings(doc_embeds, "int8", ranges=ranges)

# Calculate similarities; cast to int64 to avoid under/overflow
similarities = query_embeds.astype(int) @ doc_embeds.astype(int).T

print('After quantization:')
print(f"* Embeddings type: {query_embeds.dtype}")
print(f"* Similarities:\n{similarities}")

# After quantization:
# * Embeddings type: int8
# * Similarities:
#  [[119073  78877]
#   [ 76174  99127]]

Evaluation

Please refer to this TBD script to replicate results. The checkpoint used to produce the scores presented in the paper is here.

Citation

If you use this model in your work, please cite:

@article{mdb_leaf,  
  title         = {LEAF: Lightweight Embedding Alignment Knowledge Distillation Framework},  
  author        = {Robin Vujanic and Thomas Rueckstiess},  
  year          = {2025}
  eprint        = {TBD},
  archiveprefix = {arXiv},
  primaryclass  = {FILL HERE},
  url           = {FILL HERE}
}

License

This model is released under Apache 2.0 (TBD) License.

Contact

For questions or issues, please open an issue or pull request. You can also contact the MongoDB ML research team at [email protected].