BioForge 3b: Owl

Part of the BioForge Progressive Training Pipeline - NameDropper: OWL Ontology Expansion - Biomedical ontology knowledge

Model Overview

This is Stage 3b in the BioForge progressive training curriculum.

Training Details

Training Data: OWL ontologies (protein-free)
Epochs: 5
Batch Size: 1024
Architecture: bioformer-8L (BERT-based, 8 layers)
Embedding Dimension: 384
Max Sequence Length: 1024 tokens

Usage

from sentence_transformers import SentenceTransformer

# Load this model
model = SentenceTransformer("pankajrajdeo/bioforge-namedropper-owl")

# Encode medical text
sentences = [
    "Type 2 diabetes mellitus",
    "Myocardial infarction"
]

embeddings = model.encode(sentences)
print(embeddings.shape)  # (2, 384)

BioForge Training Pipeline

The complete BioForge pipeline consists of:

Stage 1a: PubMed Foundation → pankajrajdeo/bioforge-stage1a-pubmed
Stage 1b: Clinical Trials → pankajrajdeo/bioforge-stage1b-clinical-trials
Stage 1c: UMLS Ontology → pankajrajdeo/bioforge-stage1c-umls
Stage 3b: OWL Ontology (NameDropper) → pankajrajdeo/bioforge-namedropper-owl
Stage 4: Mixed Foundation ⭐ RECOMMENDED → pankajrajdeo/bioforge-stage4-mixed

Recommended Model

For most use cases, we recommend Stage 4 Mixed Model which combines all training data for the best overall performance.

Citation

@software{bioforge2025,
  author = {Pankaj Rajdeo},
  title = {BioForge: Progressive Biomedical Sentence Embeddings},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/pankajrajdeo/bioforge-namedropper-owl},
  note = {Stage 3b}
}

License

MIT License

Contact

Author: Pankaj Rajdeo
Institution: Cincinnati Children's Hospital Medical Center
Hugging Face: @pankajrajdeo

Downloads last month: 14

Safetensors

Model size

41.5M params

Tensor type

F32