Model Card: google/bigbird-roberta-base (Fine-Tuned for Medical QA)

Model Overview

  • Base Model: google/bigbird-roberta-base
  • Task: Fine-tuned for Medical Question Answering (QA)
  • Architecture: BigBird (sparse-attention transformer) based on RoBERTa–base
    • Processes sequences up to 4096 tokens using block-sparse attention.
    • Combines local, random, and global attention patterns.

Training Data

  • Pretraining (Base Model): Books, CC-News, Stories, Wikipedia
  • Fine-tuning (This Model): Medical QA dataset (Subset from emrqa-msqauad)

Training Procedure

  • Epochs: 3
  • Batch size: 4 (train), 8 (eval)
  • Gradient accumulation steps: 16
  • Effective batch size: 64
  • Optimizer: AdamW (via Hugging Face default)
  • Learning rate: 2e-4
  • Weight decay: 0.01
  • Hardware: GPU (fp16 enabled)

Results

Step Training Loss Validation Loss Exact Match F1
400 No log 200.938553 61.78 79.78
800 No log 200.406921 81.98 90.12
1200 No log 200.226486 90.38 94.86
1600 No log 200.168106 92.93 96.30
2000 No log 200.107483 95.59 97.74
2400 No log 200.063904 97.08 98.63
2800 No log 200.044708 97.91 98.99

Best checkpoint: Step 2800

  • Exact Match: 97.91
  • F1: 98.99

Capabilities & Use Cases

  • Medical QA: Answering domain-specific questions from medical texts
  • Long-context tasks: Works well with long medical research articles, reports, and guidelines

Limitations

  • Not guaranteed to provide medically accurate or up-to-date advice
  • Should not be used as a substitute for professional medical consultation
  • Maximum sequence length is 4096 tokens

Citations

Base Model

@misc{zaheer2021big,
  title={Big Bird: Transformers for Longer Sequences},
  author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},
  year={2021},
  eprint={2007.14062},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

@article{zaheer2020bigbird,
  title={Big bird: Transformers for longer sequences},
  author={Zaheer, Manzil and Guruganesh, Guru and Dubey, Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon, Santiago and Pham, Philip and Ravula, Anirudh and Wang, Qifan and Yang, Li and others},
  journal={Advances in Neural Information Processing Systems},
  volume={33},
  year={2020}
}
Downloads last month
5
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Diaa-K/bigbird-roberta-base-finetuned-emrqa-msquad

Finetuned
(26)
this model

Dataset used to train Diaa-K/bigbird-roberta-base-finetuned-emrqa-msquad