BioForge 3b: Owl
Part of the BioForge Progressive Training Pipeline - NameDropper: OWL Ontology Expansion - Biomedical ontology knowledge
Model Overview
This is Stage 3b in the BioForge progressive training curriculum.
Training Details
- Training Data: OWL ontologies (protein-free)
- Epochs: 5
- Batch Size: 1024
- Architecture: bioformer-8L (BERT-based, 8 layers)
- Embedding Dimension: 384
- Max Sequence Length: 1024 tokens
Usage
from sentence_transformers import SentenceTransformer
# Load this model
model = SentenceTransformer("pankajrajdeo/bioforge-namedropper-owl")
# Encode medical text
sentences = [
"Type 2 diabetes mellitus",
"Myocardial infarction"
]
embeddings = model.encode(sentences)
print(embeddings.shape) # (2, 384)
BioForge Training Pipeline
The complete BioForge pipeline consists of:
- Stage 1a: PubMed Foundation β
pankajrajdeo/bioforge-stage1a-pubmed - Stage 1b: Clinical Trials β
pankajrajdeo/bioforge-stage1b-clinical-trials - Stage 1c: UMLS Ontology β
pankajrajdeo/bioforge-stage1c-umls - Stage 3b: OWL Ontology (NameDropper) β
pankajrajdeo/bioforge-namedropper-owl - Stage 4: Mixed Foundation β RECOMMENDED β
pankajrajdeo/bioforge-stage4-mixed
Recommended Model
For most use cases, we recommend Stage 4 Mixed Model which combines all training data for the best overall performance.
Citation
@software{bioforge2025,
author = {Pankaj Rajdeo},
title = {BioForge: Progressive Biomedical Sentence Embeddings},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/pankajrajdeo/bioforge-namedropper-owl},
note = {Stage 3b}
}
License
MIT License
Contact
- Author: Pankaj Rajdeo
- Institution: Cincinnati Children's Hospital Medical Center
- Hugging Face: @pankajrajdeo
- Downloads last month
- 14