Update README.md
Browse files
README.md
CHANGED
|
@@ -209,6 +209,13 @@ print(model.compute_score(sentence_pairs,
|
|
| 209 |
- Long Document Retrieval
|
| 210 |
- MLDR:
|
| 211 |

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 212 |
- NarritiveQA:
|
| 213 |

|
| 214 |
|
|
|
|
| 209 |
- Long Document Retrieval
|
| 210 |
- MLDR:
|
| 211 |

|
| 212 |
+
Please note that MLDR is a document retrieval dataset we constructed via LLM,
|
| 213 |
+
covering 13 languages, including test set, validation set, and training set.
|
| 214 |
+
We utilized the training set from MLDR to enhance the model's long document retrieval capabilities.
|
| 215 |
+
Therefore, comparing baseline with `Dense w.o.long`(fine-tuning without long document dataset) is more equitable.
|
| 216 |
+
Additionally, this long document retrieval dataset will be open-sourced to address the current lack of open-source multilingual long text retrieval datasets.
|
| 217 |
+
We believe that this data will be helpful for the open-source community in training document retrieval models.
|
| 218 |
+
|
| 219 |
- NarritiveQA:
|
| 220 |

|
| 221 |
|