BiomedBERT Reranker
This is a Cross Encoder model finetuned from microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
The training dataset was generated using a random sample of PubMed title-abstract pairs along with similar title pairs.
Usage (txtai)
This model can be used to score a list of text pairs. This is useful as a reranking pipeline after an initial semantic search operation.
from txtai.pipeline import Similarity
ranker = Similarity(path="neuml/biomedbert-base-reranker", crossencode=True)
ranker("query", ["document1", "document2"])
Usage (Sentence-Transformers)
Alternatively, the model can be loaded with sentence-transformers.
from sentence_transformers import CrossEncoder
model = SentenceTransformer("neuml/biomedbert-base-reranker")
model.predict([["query", "document1"], ["query", "document2"]])
Evaluation Results
Performance of this model is compared to previously released models trained on medical literature.
The following datasets were used to evaluate model performance.
- PubMed QA
- Subset: pqa_labeled, Split: train, Pair: (question, long_answer)
- PubMed Subset
- Split: test, Pair: (title, text)
- PubMed Summary
- Subset: pubmed, Split: validation, Pair: (article, abstract)
Evaluation results are shown below. The Pearson correlation coefficient is used as the evaluation metric.
| Model | PubMed QA | PubMed Subset | PubMed Summary | Average |
|---|---|---|---|---|
| all-MiniLM-L6-v2 | 90.40 | 95.92 | 94.07 | 93.46 |
| bioclinical-modernbert-base-embeddings | 92.49 | 97.10 | 97.04 | 95.54 |
| biomedbert-base-colbert | 94.59 | 97.18 | 96.21 | 95.99 |
| biomedbert-base-reranker | 97.66 | 99.76 | 98.81 | 98.74 |
| pubmedbert-base-embeddings | 93.27 | 97.00 | 96.58 | 95.62 |
| pubmedbert-base-embeddings-8M | 90.05 | 94.29 | 94.15 | 92.83 |
As expected, this cross-encoder model scores much higher than bi-encoder models and late interaction models. The tradeoff is that this is expensive to run and there is no way to scale it past small batches of data. But it's a great model for re-ranking medical literature.
- Downloads last month
- 13