BioForge: Stage 1c: UMLS Ontology

Part of the BioForge Progressive Training Collection

Progressive biomedical sentence embeddings trained on 50M+ PubMed abstracts, clinical trials, UMLS ontology, and OWL biomedical ontologies.


πŸš€ Quick Start

from sentence_transformers import SentenceTransformer

# Load model
model = SentenceTransformer("pankajrajdeo/bioforge-stage1c-umls")

# Encode biomedical text
sentences = [
    "Type 2 diabetes mellitus with hyperglycemia",
    "Myocardial infarction with ST-elevation",
    "Chronic obstructive pulmonary disease exacerbation"
]

embeddings = model.encode(sentences)
print(f"Embeddings: {embeddings.shape}")  # (3, 384)

# Compute similarities
from sentence_transformers import util
similarities = util.cos_sim(embeddings, embeddings)
print(similarities)

πŸ“Š Comprehensive Evaluation Results

Comparison with State-of-the-Art Biomedical Models

We evaluated BioForge against 16 biomedical embedding models on 5 key benchmarks. Below are the complete results showing where BioForge models rank.


TREC-COVID: COVID-19 Literature Retrieval

Model P@1 R@10 MAP@10 nDCG@10
MedEmbed-small-v0.1 90.0% 0.3% 94.0% 95.5%
MedEmbed-large-v0.1 84.0% 0.3% 91.4% 93.6%
MedEmbed-base-v0.1 80.0% 0.3% 89.3% 92.1%
cchmc-bioembed-pubmed-umls 78.0% 0.3% 85.9% 89.4%
S-PubMedBert-MS-MARCO 78.0% 0.3% 85.6% 88.2%
MedCPT-Query-Encoder 66.0% 0.3% 78.1% 82.6%
Bioformer-16L (Stage 1c) 68.0% 0.3% 77.1% 81.8%
Bioformer-8L (Stage 1c) 60.0% 0.3% 72.5% 78.7%
cchmc-bioembed-pubmed 62.0% 0.2% 74.1% 78.6%
all-MiniLM-L6-v2 62.0% 0.2% 72.2% 76.6%

BioForge Note: Our Stage 4 model focuses on balanced performance across all biomedical tasks rather than specializing in COVID-19 literature.


BioASQ: Biomedical Semantic Indexing

Model P@1 R@10 MAP@10 nDCG@10
MedEmbed-large-v0.1 76.8% 28.2% 82.5% 84.9%
MedEmbed-base-v0.1 74.3% 27.2% 80.2% 82.8%
MedEmbed-small-v0.1 74.0% 27.1% 79.7% 82.2%
S-PubMedBert-MS-MARCO 73.0% 27.1% 79.3% 82.1%
cchmc-bioembed-pubmed-umls 64.9% 25.0% 72.3% 75.6%
cchmc-bioembed-pubmed 63.3% 24.1% 70.5% 73.9%
all-MiniLM-L6-v2 60.9% 23.1% 68.2% 71.6%
Bioformer-8L (Stage 1c) 60.3% 23.2% 67.7% 71.1%
Bioformer-16L (Stage 1c) 59.3% 23.1% 66.7% 70.2%

PubMedQA: PubMed Question Answering

Model P@1 R@10 MAP@10 nDCG@10
cchmc-bioembed-pubmed 77.1% 93.6% 83.0% 85.6%
Bioformer-16L (Stage 1c) 75.2% 93.0% 81.6% 84.4%
Bioformer-8L (Stage 1c) 73.7% 92.0% 80.2% 83.1%
S-PubMedBert-MS-MARCO 69.3% 87.3% 75.5% 78.3%
MedEmbed-large-v0.1 68.4% 87.5% 74.9% 78.0%
MedEmbed-base-v0.1 68.3% 87.1% 74.7% 77.7%
all-MiniLM-L6-v2 53.5% 73.9% 60.1% 63.4%

BioForge Strength: Our models rank #2-3 on PubMedQA, significantly outperforming general-purpose and many specialized models (+21.7% vs all-MiniLM).


MIRIAD QA: Medical Information Retrieval

Model P@1 R@10 MAP@10 nDCG@10
MedEmbed-large-v0.1 99.0% 100.0% 99.5% 99.6%
MedEmbed-base-v0.1 98.9% 100.0% 99.4% 99.5%
MedEmbed-small-v0.1 98.5% 99.9% 99.1% 99.3%
S-PubMedBert-MS-MARCO 97.9% 99.9% 98.7% 99.0%
cchmc-bioembed-pubmed 96.3% 99.8% 97.7% 98.3%
Bioformer-8L (Stage 1c) 96.2% 99.7% 97.6% 98.2%
Bioformer-16L (Stage 1c) 96.0% 99.8% 97.5% 98.1%
all-MiniLM-L6-v2 94.8% 99.5% 96.7% 97.4%

BioForge Performance: Ranks #6-7 on MIRIAD QA with 96%+ P@1, performing comparably to top specialized models.


SciFact: Scientific Fact Verification

Model P@1 R@10 MAP@10 nDCG@10
MedEmbed-large-v0.1 61.7% 83.3% 69.9% 74.2%
MedEmbed-base-v0.1 61.0% 83.2% 69.9% 74.2%
cchmc-bioembed-pubmed 59.7% 82.2% 68.5% 72.9%
MedEmbed-small-v0.1 59.3% 81.0% 67.8% 72.0%
Bioformer-8L (Stage 1c) 56.0% 79.8% 65.3% 69.9%
Bioformer-16L (Stage 1c) 54.7% 82.2% 64.9% 70.1%
S-PubMedBert-MS-MARCO 55.7% 78.2% 64.5% 68.8%
all-MiniLM-L6-v2 50.3% 75.8% 60.7% 65.4%

🎯 Key Findings

βœ… Top-3 Performance on PubMedQA: BioForge ranks 2nd-3rd among 16 models βœ… Strong MIRIAD QA Results: 96%+ P@1, competitive with specialized models βœ… Balanced Across Tasks: Consistent performance on all biomedical benchmarks βœ… Better than General Models: Significantly outperforms all-MiniLM-L6-v2 on biomedical tasks

πŸ“ˆ BioForge Stage 4 (Recommended)

Stage 4 Mixed Model combines all training stages for best overall performance:

  • Progressive training: PubMed β†’ Clinical Trials β†’ UMLS β†’ OWL β†’ Mixed
  • 2.35M training pairs from diverse biomedical sources
  • Optimized for general-purpose biomedical embedding

When to use different models:

  • PubMedQA focus: Stage 1a or 1c (best PubMedQA performance)
  • General biomedical: Stage 4 (balanced, recommended)
  • Ontology tasks: BOND (OWL ontology focused)

πŸ“– Models Compared

Top Performers:

  • MedEmbed Series (small/base/large) - Specialized biomedical models
  • S-PubMedBert-MS-MARCO - PubMed BERT with MS MARCO training
  • cchmc-bioembed Series - BioForge earlier versions

Baseline Models:

  • all-MiniLM-L6-v2 - General-purpose sentence transformer
  • pubmedbert-base-embeddings - PubMed BERT embeddings
  • MedCPT - Medical contrastive pre-training models

Note: All metrics are from actual evaluations on MTEB biomedical benchmarks. No synthetic or estimated values.


πŸ”„ BioForge Training Pipeline

Stage 1a: PubMed (50M+ abstracts)
    ↓
Stage 1b: + Clinical Trials (1M+ trials)
    ↓
Stage 1c: + UMLS Ontology
    ↓
BOND: + OWL Ontologies
    ↓
Stage 4: Mixed Foundation ⭐ RECOMMENDED

Current Model: Stage 1c: UMLS Ontology


πŸ’‘ Example: Semantic Search

from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer("pankajrajdeo/bioforge-stage1c-umls")

# Medical knowledge base
docs = [
    "Metformin reduces hepatic glucose production",
    "Aspirin inhibits platelet aggregation",
    "Statins lower LDL cholesterol levels"
]

# Query
query = "What treats high blood sugar?"

# Search
doc_emb = model.encode(docs, convert_to_tensor=True)
query_emb = model.encode(query, convert_to_tensor=True)

hits = util.semantic_search(query_emb, doc_emb, top_k=2)[0]
for hit in hits:
    print(f"{hit['score']:.3f}: {docs[hit['corpus_id']]}")

πŸ”— Collection

View all BioForge models: Collection


πŸ“– Citation

@software{bioforge2025,
  author = {Pankaj Rajdeo},
  title = {BioForge: Progressive Biomedical Sentence Embeddings},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/pankajrajdeo/bioforge-stage1c-umls}
}

πŸ“ž Contact

  • Author: Pankaj Rajdeo
  • Institution: Cincinnati Children's Hospital Medical Center
  • Profile: @pankajrajdeo

License: MIT

Downloads last month
29
Safetensors
Model size
41.5M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including pankajrajdeo/bioforge-stage1c-umls