You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Parrotlet-e: Indic Medical Embedding Model

Parrotlet-e is a state of the art multilingual medical embedding model designed for understanding and linking medical terms across Indian languages. It is optimised for entity-level representation of clinical concepts such as symptoms, diagnoses, and anatomical structures — enabling accurate medical coding, semantic search, and cross-lingual retrieval in healthcare applications.

The model is fine-tuned from bge-m3 using weakly supervised contrastive learning with Multi-Similarity Loss on over 18 million multilingual medical term pairs aligned with SNOMED CT and UMLS. It supports both native and romanized scripts across 12 Indic languages and English, and is robust to abbreviations, spelling variations, and colloquial expressions commonly found in clinical documentation.

Indic Languages support:

  • Hindi
  • Kannada
  • Marathi
  • Malayalam
  • Tamil
  • Telugu
  • Odia
  • Assamese
  • Bengali
  • Urdu
  • Gujarati
  • Punjabi

Loading the model from Hugging Face Hub

from transformers import AutoTokenizer, AutoModel
import torch

# Load model and tokenizer
model_name = "ekacare/parrotlet-e"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

# Sample medical terms (can be in any supported language)
texts = [
    "diabetes mellitus",
    "मधुमेह",
    "sugar problem"
]

# Tokenize input
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")

# Get model outputs
with torch.no_grad():
    outputs = model(**inputs)
    embeddings = outputs.last_hidden_state

# Mean pooling
attention_mask = inputs['attention_mask']
embeddings = (embeddings * attention_mask.unsqueeze(-1)).sum(1) / attention_mask.sum(1).unsqueeze(-1)

# Normalize embeddings
embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)

Evaluation Results on Eka-IndicMTEB

We evaluated Parrotlet-e on the Eka-IndicMTEB benchmark using KARMA, with metrics computed at Recall@1, Recall@3, and Recall@5.

Model Recall@1 Recall@3 Recall@5
Parrotlet-e 0.7206 0.8320 0.8512
cambridgeltl/SapBERT-from-PubMedBERT-fulltext 0.3574 0.4427 0.4684
BAAI/bge-m3 0.3146 0.4060 0.4444
google/embeddinggemma-300m 0.1031 0.1408 0.1525
ai4bharat/IndicBERTv2-MLM-only 0.0311 0.0573 0.0724

EkaCare Parrotlet-e and the Eka-IndicMTEB benchmark together provide a foundation for building robust, cross-lingual medical AI systems — enabling better coding, documentation, and understanding across India’s diverse clinical landscape.

Authentication (if required)

Set up your Hugging Face token (if required):

Log in to your Hugging Face account and generate an access token at Hugging Face Settings. Set the token in your environment:

export HF_TOKEN="your-access-token"

Alternatively, use the Hugging Face CLI to log in:

huggingface-cli login

License

This model is released under CC-By-SA-4.0

Downloads last month
1,138
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ekacare/parrotlet-e

Base model

BAAI/bge-m3
Finetuned
(350)
this model

Space using ekacare/parrotlet-e 1

Collection including ekacare/parrotlet-e