Qwen3-14B-PubMedQA-LoRA-Adapters

Model Overview

This repository contains LoRA adapters for fine-tuning the Qwen3-14B model specifically for biomedical question answering. The model has been trained on the PubMedQA dataset to provide accurate yes/no/maybe answers followed by detailed explanations for biomedical questions.

Developed by: huseyincavus
Base Model: unsloth/qwen3-14b-unsloth-bnb-4bit
License: Apache 2.0
Model Type: Parameter-Efficient Fine-Tuning (PEFT) using LoRA adapters
Language: English
Domain: Biomedical Question Answering
Project Repository: Github

Fine-Tuning Details

This model was efficiently fine-tuned using the Unsloth library, achieving 2x faster training speeds with reduced memory usage. The fine-tuning process was optimized to run on a single free-tier Google Colab T4 GPU.

Training Configuration

Dataset: PubMedQA (pqa_artificial split)
Technique: Low-Rank Adaptation (LoRA)
Quantization: 4-bit quantization for memory efficiency
Training Steps: 300 steps (demonstration)
Training Time: Approximately 2 hours on T4 GPU
Optimization: Unsloth acceleration framework

Key Features

Specialized Domain: Biomedical question answering
Output Format: Direct yes/no/maybe answer with detailed explanation
Memory Efficient: 4-bit quantization enables training on limited hardware
Fast Training: 2x speed improvement with Unsloth optimization
Lightweight: LoRA adapters require minimal storage and can be easily shared

Model Capabilities

The model is designed to:

Answer biomedical questions with yes/no/maybe responses
Provide detailed explanations based on scientific context
Process complex biomedical literature and research questions
Maintain accuracy while being resource-efficient

Usage

Loading the Model

from unsloth import FastLanguageModel
from peft import PeftModel
import torch

# Load base model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/qwen3-14b-unsloth-bnb-4bit",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)

# Load LoRA adapters
model = PeftModel.from_pretrained(model, "huseyincavus/Qwen3-14B-PubMedQA-lora-adapters")

Inference Example

from transformers import TextStreamer

# Define the conversation using the chat template structure
system_prompt = "You are a helpful biomedical assistant. Your task is to answer the given question based on the provided context. First, provide a simple 'yes', 'no', or 'maybe' answer, followed by a detailed explanation."

user_question = "Is there a definitive link between coffee consumption and a reduced risk of Parkinson's disease?"

user_context = "Several epidemiological studies have suggested an inverse association between coffee consumption and the risk of Parkinson's disease (PD). A large meta-analysis of 26 studies found that the risk of PD was, on average, 30% lower in coffee drinkers compared to non-drinkers. The association appears to be dose-dependent. However, the mechanism is not fully understood, though caffeine's role as an adenosine A2A receptor antagonist is a leading hypothesis. It's important to note that these are observational studies, which show correlation but cannot prove causation."

# Combine the question and context into the user's message
user_prompt = f"Question: {user_question}\nContext: {user_context}"

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt},
]

# Apply the chat template - add_generation_prompt=True is crucial
prompt = tokenizer.apply_chat_template(
    messages, 
    tokenize=False, 
    add_generation_prompt=True,
    enable_thinking=False  # Prevents thinking blocks for direct answers
)

# Tokenize and move to GPU
model_inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

# Set up streaming for live output
streamer = TextStreamer(tokenizer, skip_prompt=True)

# Generate response with streaming
print("\n" + "="*50)
print(" BIOMEDICAL QA MODEL RESPONSE")
print("="*50 + "\n")

outputs = model.generate(
    **model_inputs,
    streamer=streamer,
    max_new_tokens=256,
    temperature=0.6,
    top_p=0.9,
    do_sample=True
)

Training Process

The complete training pipeline is available in the GitHub repository with the Jupyter notebook Qwen3_(14B)_PubMed_QA.ipynb, which includes:

Environment Setup: Installation of required libraries (unsloth, transformers, peft, trl)
Model Loading: Loading Qwen3-14B with 4-bit quantization
Data Preprocessing: Formatting PubMedQA dataset for chat-based training
Fine-tuning: Using SFTTrainer for parameter-efficient training
Evaluation: Inference testing on biomedical questions
Model Saving: Pushing LoRA adapters to Hugging Face Hub

Performance

The model demonstrates strong performance on biomedical question answering tasks, providing:

Accurate yes/no/maybe classifications
Detailed, context-aware explanations
Consistent response format suitable for downstream applications

Limitations

Trained specifically on biomedical domain; may not generalize to other domains
Limited to English language
Requires base Qwen3-14B model for full functionality
Performance may vary on questions outside the PubMedQA distribution

Getting Started

To reproduce the training or use the model:

Open the notebook in Google Colab
Enable T4 GPU: Runtime > Change runtime type > T4 GPU
Run all cells to execute the complete pipeline
Optional: Set up HF_TOKEN in Colab Secrets to save your own adapters

About This Project

This is a personal project exploring biomedical AI applications. Feel free to use, modify, or build upon this work! If you find it helpful, a mention or star would be appreciated but not required.

Acknowledgments

Unsloth Team for the acceleration framework
Qwen Team for the base model
PubMedQA Dataset creators for the training data
Hugging Face for the model hosting and tools

Downloads last month: -; Downloads are not tracked for this model. How to track

huseyincavus
/

Qwen3-14B-PubMedQA-lora-adapters