Parrotlet-e: Indic Medical Embedding Model
Parrotlet-e is a state of the art multilingual medical embedding model designed for understanding and linking medical terms across Indian languages. It is optimised for entity-level representation of clinical concepts such as symptoms, diagnoses, and anatomical structures — enabling accurate medical coding, semantic search, and cross-lingual retrieval in healthcare applications.
The model is fine-tuned from bge-m3 using weakly supervised contrastive learning with Multi-Similarity Loss on over 18 million multilingual medical term pairs aligned with SNOMED CT and UMLS. It supports both native and romanized scripts across 12 Indic languages and English, and is robust to abbreviations, spelling variations, and colloquial expressions commonly found in clinical documentation.
Indic Languages support:
- Hindi
- Kannada
- Marathi
- Malayalam
- Tamil
- Telugu
- Odia
- Assamese
- Bengali
- Urdu
- Gujarati
- Punjabi
Loading the model from Hugging Face Hub
from transformers import AutoTokenizer, AutoModel
import torch
# Load model and tokenizer
model_name = "ekacare/parrotlet-e"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
# Sample medical terms (can be in any supported language)
texts = [
"diabetes mellitus",
"मधुमेह",
"sugar problem"
]
# Tokenize input
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
# Get model outputs
with torch.no_grad():
outputs = model(**inputs)
embeddings = outputs.last_hidden_state
# Mean pooling
attention_mask = inputs['attention_mask']
embeddings = (embeddings * attention_mask.unsqueeze(-1)).sum(1) / attention_mask.sum(1).unsqueeze(-1)
# Normalize embeddings
embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)
Evaluation Results on Eka-IndicMTEB
We evaluated Parrotlet-e on the Eka-IndicMTEB benchmark using KARMA, with metrics computed at Recall@1, Recall@3, and Recall@5.
| Model | Recall@1 | Recall@3 | Recall@5 |
|---|---|---|---|
| Parrotlet-e | 0.7206 | 0.8320 | 0.8512 |
| cambridgeltl/SapBERT-from-PubMedBERT-fulltext | 0.3574 | 0.4427 | 0.4684 |
| BAAI/bge-m3 | 0.3146 | 0.4060 | 0.4444 |
| google/embeddinggemma-300m | 0.1031 | 0.1408 | 0.1525 |
| ai4bharat/IndicBERTv2-MLM-only | 0.0311 | 0.0573 | 0.0724 |
EkaCare Parrotlet-e and the Eka-IndicMTEB benchmark together provide a foundation for building robust, cross-lingual medical AI systems — enabling better coding, documentation, and understanding across India’s diverse clinical landscape.
Authentication (if required)
Set up your Hugging Face token (if required):
Log in to your Hugging Face account and generate an access token at Hugging Face Settings. Set the token in your environment:
export HF_TOKEN="your-access-token"
Alternatively, use the Hugging Face CLI to log in:
huggingface-cli login
License
This model is released under CC-By-SA-4.0
- Downloads last month
- 1,138
Model tree for ekacare/parrotlet-e
Base model
BAAI/bge-m3