Qwen3-14B-PubMedQA-LoRA-Adapters
Model Overview
This repository contains LoRA adapters for fine-tuning the Qwen3-14B model specifically for biomedical question answering. The model has been trained on the PubMedQA dataset to provide accurate yes/no/maybe answers followed by detailed explanations for biomedical questions.
- Developed by: huseyincavus
- Base Model: unsloth/qwen3-14b-unsloth-bnb-4bit
- License: Apache 2.0
- Model Type: Parameter-Efficient Fine-Tuning (PEFT) using LoRA adapters
- Language: English
- Domain: Biomedical Question Answering
- Project Repository: Github
Fine-Tuning Details
This model was efficiently fine-tuned using the Unsloth library, achieving 2x faster training speeds with reduced memory usage. The fine-tuning process was optimized to run on a single free-tier Google Colab T4 GPU.
Training Configuration
- Dataset: PubMedQA (pqa_artificial split)
- Technique: Low-Rank Adaptation (LoRA)
- Quantization: 4-bit quantization for memory efficiency
- Training Steps: 300 steps (demonstration)
- Training Time: Approximately 2 hours on T4 GPU
- Optimization: Unsloth acceleration framework
Key Features
- Specialized Domain: Biomedical question answering
- Output Format: Direct yes/no/maybe answer with detailed explanation
- Memory Efficient: 4-bit quantization enables training on limited hardware
- Fast Training: 2x speed improvement with Unsloth optimization
- Lightweight: LoRA adapters require minimal storage and can be easily shared
Model Capabilities
The model is designed to:
- Answer biomedical questions with yes/no/maybe responses
- Provide detailed explanations based on scientific context
- Process complex biomedical literature and research questions
- Maintain accuracy while being resource-efficient
Usage
Loading the Model
from unsloth import FastLanguageModel
from peft import PeftModel
import torch
# Load base model
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/qwen3-14b-unsloth-bnb-4bit",
max_seq_length=2048,
dtype=None,
load_in_4bit=True,
)
# Load LoRA adapters
model = PeftModel.from_pretrained(model, "huseyincavus/Qwen3-14B-PubMedQA-lora-adapters")
Inference Example
from transformers import TextStreamer
# Define the conversation using the chat template structure
system_prompt = "You are a helpful biomedical assistant. Your task is to answer the given question based on the provided context. First, provide a simple 'yes', 'no', or 'maybe' answer, followed by a detailed explanation."
user_question = "Is there a definitive link between coffee consumption and a reduced risk of Parkinson's disease?"
user_context = "Several epidemiological studies have suggested an inverse association between coffee consumption and the risk of Parkinson's disease (PD). A large meta-analysis of 26 studies found that the risk of PD was, on average, 30% lower in coffee drinkers compared to non-drinkers. The association appears to be dose-dependent. However, the mechanism is not fully understood, though caffeine's role as an adenosine A2A receptor antagonist is a leading hypothesis. It's important to note that these are observational studies, which show correlation but cannot prove causation."
# Combine the question and context into the user's message
user_prompt = f"Question: {user_question}\nContext: {user_context}"
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
]
# Apply the chat template - add_generation_prompt=True is crucial
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False # Prevents thinking blocks for direct answers
)
# Tokenize and move to GPU
model_inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
# Set up streaming for live output
streamer = TextStreamer(tokenizer, skip_prompt=True)
# Generate response with streaming
print("\n" + "="*50)
print(" BIOMEDICAL QA MODEL RESPONSE")
print("="*50 + "\n")
outputs = model.generate(
**model_inputs,
streamer=streamer,
max_new_tokens=256,
temperature=0.6,
top_p=0.9,
do_sample=True
)
Training Process
The complete training pipeline is available in the GitHub repository with the Jupyter notebook Qwen3_(14B)_PubMed_QA.ipynb, which includes:
- Environment Setup: Installation of required libraries (unsloth, transformers, peft, trl)
- Model Loading: Loading Qwen3-14B with 4-bit quantization
- Data Preprocessing: Formatting PubMedQA dataset for chat-based training
- Fine-tuning: Using SFTTrainer for parameter-efficient training
- Evaluation: Inference testing on biomedical questions
- Model Saving: Pushing LoRA adapters to Hugging Face Hub
Performance
The model demonstrates strong performance on biomedical question answering tasks, providing:
- Accurate yes/no/maybe classifications
- Detailed, context-aware explanations
- Consistent response format suitable for downstream applications
Limitations
- Trained specifically on biomedical domain; may not generalize to other domains
- Limited to English language
- Requires base Qwen3-14B model for full functionality
- Performance may vary on questions outside the PubMedQA distribution
Getting Started
To reproduce the training or use the model:
- Open the notebook in Google Colab
- Enable T4 GPU: Runtime > Change runtime type > T4 GPU
- Run all cells to execute the complete pipeline
- Optional: Set up HF_TOKEN in Colab Secrets to save your own adapters
About This Project
This is a personal project exploring biomedical AI applications. Feel free to use, modify, or build upon this work! If you find it helpful, a mention or star would be appreciated but not required.
Acknowledgments
- Unsloth Team for the acceleration framework
- Qwen Team for the base model
- PubMedQA Dataset creators for the training data
- Hugging Face for the model hosting and tools
