BiomedBERT Reranker

This is a Cross Encoder model finetuned from microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

The training dataset was generated using a random sample of PubMed title-abstract pairs along with similar title pairs.

Usage (txtai)

This model can be used to score a list of text pairs. This is useful as a reranking pipeline after an initial semantic search operation.

from txtai.pipeline import Similarity

ranker = Similarity(path="neuml/biomedbert-base-reranker", crossencode=True)
ranker("query", ["document1", "document2"])

Usage (Sentence-Transformers)

Alternatively, the model can be loaded with sentence-transformers.

from sentence_transformers import CrossEncoder

model = SentenceTransformer("neuml/biomedbert-base-reranker")
model.predict([["query", "document1"], ["query", "document2"]])

Evaluation Results

Performance of this model is compared to previously released models trained on medical literature.

The following datasets were used to evaluate model performance.

  • PubMed QA
    • Subset: pqa_labeled, Split: train, Pair: (question, long_answer)
  • PubMed Subset
    • Split: test, Pair: (title, text)
  • PubMed Summary
    • Subset: pubmed, Split: validation, Pair: (article, abstract)

Evaluation results are shown below. The Pearson correlation coefficient is used as the evaluation metric.

Model PubMed QA PubMed Subset PubMed Summary Average
all-MiniLM-L6-v2 90.40 95.92 94.07 93.46
bioclinical-modernbert-base-embeddings 92.49 97.10 97.04 95.54
biomedbert-base-colbert 94.59 97.18 96.21 95.99
biomedbert-base-reranker 97.66 99.76 98.81 98.74
pubmedbert-base-embeddings 93.27 97.00 96.58 95.62
pubmedbert-base-embeddings-8M 90.05 94.29 94.15 92.83

As expected, this cross-encoder model scores much higher than bi-encoder models and late interaction models. The tradeoff is that this is expensive to run and there is no way to scale it past small batches of data. But it's a great model for re-ranking medical literature.

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NeuML/biomedbert-base-reranker

Collection including NeuML/biomedbert-base-reranker