You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Parrotlet-e: Indic Medical Embedding Model

Parrotlet-e is a state of the art multilingual medical embedding model designed for understanding and linking medical terms across Indian languages. It is optimised for entity-level representation of clinical concepts such as symptoms, diagnoses, and anatomical structures — enabling accurate medical coding, semantic search, and cross-lingual retrieval in healthcare applications.

The model is fine-tuned from bge-m3 using weakly supervised contrastive learning with Multi-Similarity Loss on over 18 million multilingual medical term pairs aligned with SNOMED CT and UMLS. It supports both native and romanized scripts across 12 Indic languages and English, and is robust to abbreviations, spelling variations, and colloquial expressions commonly found in clinical documentation.

Indic Languages support:

Hindi
Kannada
Marathi
Malayalam
Tamil
Telugu
Odia
Assamese
Bengali
Urdu
Gujarati
Punjabi

Loading the model from Hugging Face Hub

from transformers import AutoTokenizer, AutoModel
import torch

# Load model and tokenizer
model_name = "ekacare/parrotlet-e"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

# Sample medical terms (can be in any supported language)
texts = [
    "diabetes mellitus",
    "मधुमेह",
    "sugar problem"
]

# Tokenize input
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")

# Get model outputs
with torch.no_grad():
    outputs = model(**inputs)
    embeddings = outputs.last_hidden_state

# Mean pooling
attention_mask = inputs['attention_mask']
embeddings = (embeddings * attention_mask.unsqueeze(-1)).sum(1) / attention_mask.sum(1).unsqueeze(-1)

# Normalize embeddings
embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)

Evaluation Results on Eka-IndicMTEB

We evaluated Parrotlet-e on the Eka-IndicMTEB benchmark using KARMA, with metrics computed at Recall@1, Recall@3, and Recall@5.

Model	Recall@1	Recall@3	Recall@5
Parrotlet-e	0.7206	0.8320	0.8512
cambridgeltl/SapBERT-from-PubMedBERT-fulltext	0.3574	0.4427	0.4684
BAAI/bge-m3	0.3146	0.4060	0.4444
google/embeddinggemma-300m	0.1031	0.1408	0.1525
ai4bharat/IndicBERTv2-MLM-only	0.0311	0.0573	0.0724

EkaCare Parrotlet-e and the Eka-IndicMTEB benchmark together provide a foundation for building robust, cross-lingual medical AI systems — enabling better coding, documentation, and understanding across India’s diverse clinical landscape.

Authentication (if required)

Set up your Hugging Face token (if required):

Log in to your Hugging Face account and generate an access token at Hugging Face Settings. Set the token in your environment:

export HF_TOKEN="your-access-token"

Alternatively, use the Hugging Face CLI to log in:

huggingface-cli login

License

This model is released under CC-By-SA-4.0

Downloads last month: 1,138

Safetensors

Model size

0.6B params

Tensor type

F32

Model tree for ekacare/parrotlet-e

Base model

BAAI/bge-m3

Finetuned

(350)

this model

Space using ekacare/parrotlet-e 1

Collection including ekacare/parrotlet-e

EkaCare Public HealthCare LLMs

Collection

3 items • Updated 25 days ago • 1