Instructions to use abokbot/wikipedia-embedding with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use abokbot/wikipedia-embedding with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("abokbot/wikipedia-embedding") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -9,7 +9,7 @@ tags:
|
|
| 9 |
- MSMARCO
|
| 10 |
---
|
| 11 |
# Description
|
| 12 |
-
We use MS Marco Encoder msmarco-MiniLM-L-6-v3 to encode the text from dataset [abokbot/wikipedia-first-paragraph](https://huggingface.co/datasets/abokbot/wikipedia-first-paragraph).
|
| 13 |
|
| 14 |
The dataset contains the first paragraphs of the English "20220301.en" version of the [Wikipedia dataset](https://huggingface.co/datasets/wikipedia).
|
| 15 |
|
|
@@ -28,4 +28,7 @@ bi_encoder.max_seq_length = 256
|
|
| 28 |
wikipedia_embedding = bi_encoder.encode(dataset["text"], convert_to_tensor=True, show_progress_bar=True)
|
| 29 |
|
| 30 |
```
|
| 31 |
-
This operation took 35min on a Google Colab notebook with GPU.
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
- MSMARCO
|
| 10 |
---
|
| 11 |
# Description
|
| 12 |
+
We use MS Marco Encoder msmarco-MiniLM-L-6-v3 from the sentence-transformers library to encode the text from dataset [abokbot/wikipedia-first-paragraph](https://huggingface.co/datasets/abokbot/wikipedia-first-paragraph).
|
| 13 |
|
| 14 |
The dataset contains the first paragraphs of the English "20220301.en" version of the [Wikipedia dataset](https://huggingface.co/datasets/wikipedia).
|
| 15 |
|
|
|
|
| 28 |
wikipedia_embedding = bi_encoder.encode(dataset["text"], convert_to_tensor=True, show_progress_bar=True)
|
| 29 |
|
| 30 |
```
|
| 31 |
+
This operation took 35min on a Google Colab notebook with GPU.
|
| 32 |
+
|
| 33 |
+
# Reference
|
| 34 |
+
More information of MS Marco encoders here https://www.sbert.net/docs/pretrained-models/ce-msmarco.html
|