BioForge 3b: Owl

Part of the BioForge Progressive Training Pipeline - NameDropper: OWL Ontology Expansion - Biomedical ontology knowledge

Model Overview

This is Stage 3b in the BioForge progressive training curriculum.

Training Details

  • Training Data: OWL ontologies (protein-free)
  • Epochs: 5
  • Batch Size: 1024
  • Architecture: bioformer-8L (BERT-based, 8 layers)
  • Embedding Dimension: 384
  • Max Sequence Length: 1024 tokens

Usage

from sentence_transformers import SentenceTransformer

# Load this model
model = SentenceTransformer("pankajrajdeo/bioforge-namedropper-owl")

# Encode medical text
sentences = [
    "Type 2 diabetes mellitus",
    "Myocardial infarction"
]

embeddings = model.encode(sentences)
print(embeddings.shape)  # (2, 384)

BioForge Training Pipeline

The complete BioForge pipeline consists of:

  1. Stage 1a: PubMed Foundation β†’ pankajrajdeo/bioforge-stage1a-pubmed
  2. Stage 1b: Clinical Trials β†’ pankajrajdeo/bioforge-stage1b-clinical-trials
  3. Stage 1c: UMLS Ontology β†’ pankajrajdeo/bioforge-stage1c-umls
  4. Stage 3b: OWL Ontology (NameDropper) β†’ pankajrajdeo/bioforge-namedropper-owl
  5. Stage 4: Mixed Foundation ⭐ RECOMMENDED β†’ pankajrajdeo/bioforge-stage4-mixed

Recommended Model

For most use cases, we recommend Stage 4 Mixed Model which combines all training data for the best overall performance.

Citation

@software{bioforge2025,
  author = {Pankaj Rajdeo},
  title = {BioForge: Progressive Biomedical Sentence Embeddings},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/pankajrajdeo/bioforge-namedropper-owl},
  note = {Stage 3b}
}

License

MIT License

Contact

  • Author: Pankaj Rajdeo
  • Institution: Cincinnati Children's Hospital Medical Center
  • Hugging Face: @pankajrajdeo
Downloads last month
14
Safetensors
Model size
41.5M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support