Model Card: google/bigbird-roberta-base (Fine-Tuned for Medical QA)
Model Overview
- Base Model:
google/bigbird-roberta-base - Task: Fine-tuned for Medical Question Answering (QA)
- Architecture: BigBird (sparse-attention transformer) based on RoBERTa–base
- Processes sequences up to 4096 tokens using block-sparse attention.
- Combines local, random, and global attention patterns.
Training Data
- Pretraining (Base Model): Books, CC-News, Stories, Wikipedia
- Fine-tuning (This Model): Medical QA dataset (Subset from emrqa-msqauad)
Training Procedure
- Epochs: 3
- Batch size: 4 (train), 8 (eval)
- Gradient accumulation steps: 16
- Effective batch size: 64
- Optimizer: AdamW (via Hugging Face default)
- Learning rate: 2e-4
- Weight decay: 0.01
- Hardware: GPU (fp16 enabled)
Results
| Step | Training Loss | Validation Loss | Exact Match | F1 |
|---|---|---|---|---|
| 400 | No log | 200.938553 | 61.78 | 79.78 |
| 800 | No log | 200.406921 | 81.98 | 90.12 |
| 1200 | No log | 200.226486 | 90.38 | 94.86 |
| 1600 | No log | 200.168106 | 92.93 | 96.30 |
| 2000 | No log | 200.107483 | 95.59 | 97.74 |
| 2400 | No log | 200.063904 | 97.08 | 98.63 |
| 2800 | No log | 200.044708 | 97.91 | 98.99 |
Best checkpoint: Step 2800
- Exact Match: 97.91
- F1: 98.99
Capabilities & Use Cases
- Medical QA: Answering domain-specific questions from medical texts
- Long-context tasks: Works well with long medical research articles, reports, and guidelines
Limitations
- Not guaranteed to provide medically accurate or up-to-date advice
- Should not be used as a substitute for professional medical consultation
- Maximum sequence length is 4096 tokens
Citations
Base Model
@misc{zaheer2021big,
title={Big Bird: Transformers for Longer Sequences},
author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},
year={2021},
eprint={2007.14062},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@article{zaheer2020bigbird,
title={Big bird: Transformers for longer sequences},
author={Zaheer, Manzil and Guruganesh, Guru and Dubey, Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon, Santiago and Pham, Philip and Ravula, Anirudh and Wang, Qifan and Yang, Li and others},
journal={Advances in Neural Information Processing Systems},
volume={33},
year={2020}
}
- Downloads last month
- 5
Model tree for Diaa-K/bigbird-roberta-base-finetuned-emrqa-msquad
Base model
google/bigbird-roberta-base