Marqo
/

gcl-e5-large-v2-113-gs-full

Feature Extraction

text-embeddings-inference

Model card Files Files and versions

Jesse-marqo commited on Aug 13, 2024

Commit

eb26513

·

verified ·

1 Parent(s): 4750856

Update README.md

Files changed (1) hide show

README.md +38 -1

README.md CHANGED Viewed

@@ -1,4 +1,41 @@
 ---
 license: apache-2.0
 ---
-Rank-tuned e5-large-v2 on the Marqo-GS-10M dataset for ecommerce. Full details here https://github.com/marqo-ai/GCL

 ---
 license: apache-2.0
 ---
+Rank-tuned e5-large-v2 on the Marqo-GS-10M dataset for ecommerce. Full details here https://github.com/marqo-ai/GCL
+```python
+import torch.nn.functional as F
+from torch import Tensor
+from transformers import AutoTokenizer, AutoModel
+def average_pool(last_hidden_states: Tensor,
+                 attention_mask: Tensor) -> Tensor:
+    last_hidden = last_hidden_states.masked_fill(~attention_mask[..., None].bool(), 0.0)
+    return last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None]
+# Each input text should start with "query: " or "passage: ".
+# For tasks other than retrieval, you can simply use the "query: " prefix.
+input_texts = ['query: Espresso Pitcher with Handle',
+               'query: Women’s designer handbag sale',
+               "passage: Dianoo Espresso Steaming Pitcher, Espresso Milk Frothing Pitcher Stainless Steel",
+               "passage: Coach Outlet Eliza Shoulder Bag - Black - One Size"]
+tokenizer = AutoTokenizer.from_pretrained('Marqo/marqo-gcl-e5-large-v2-130')
+model_new = AutoModel.from_pretrained('Marqo/marqo-gcl-e5-large-v2-130')
+# Tokenize the input texts
+batch_dict = tokenizer(input_texts, max_length=77, padding=True, truncation=True, return_tensors='pt')
+outputs = model_new(**batch_dict)
+embeddings = average_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
+# normalize embeddings
+embeddings = F.normalize(embeddings, p=2, dim=1)
+scores = (embeddings[:2] @ embeddings[2:].T) * 100
+print(scores.tolist())
+```