BAAI
/

bge-m3

Sentence Similarity

sentence-transformers

feature-extraction

text-embeddings-inference

Model card Files Files and versions

Shitao commited on Feb 1, 2024

Commit

4277867

·

verified ·

1 Parent(s): 1f5d3ac

Update README.md

Files changed (1) hide show

README.md +7 -0

README.md CHANGED Viewed

@@ -209,6 +209,13 @@ print(model.compute_score(sentence_pairs,
 - Long Document Retrieval
   - MLDR:
   ![avatar](./imgs/long.jpg)
   - NarritiveQA:
   ![avatar](./imgs/nqa.jpg)

 - Long Document Retrieval
   - MLDR:
   ![avatar](./imgs/long.jpg)
+  Please note that MLDR is a document retrieval dataset we constructed via LLM,
+  covering 13 languages, including test set, validation set, and training set.
+  We utilized the training set from MLDR to enhance the model's long document retrieval capabilities.
+  Therefore, comparing baseline with `Dense w.o.long`(fine-tuning without long document dataset) is more equitable.
+  Additionally, this long document retrieval dataset will be open-sourced to address the current lack of open-source multilingual long text retrieval datasets.
+  We believe that this data will be helpful for the open-source community in training document retrieval models.
   - NarritiveQA:
   ![avatar](./imgs/nqa.jpg)