Medical Reasoning Model β€” Qwen2.5-0.5B (LoRA/QLoRA Fine-tuned)

A compact medical-reasoning model fine-tuned from Qwen/Qwen2.5-0.5B-Instruct for step-by-step clinical reasoning. It is optimized to break down cases, propose differentials, and justify answers.

Note: Outputs may include intermediate reasoning. Do not treat responses as medical advice.


🧠 Model Summary

  • Base: Qwen2.5-0.5B-Instruct
  • Architecture: Decoder-only transformer
  • Finetuning: LoRA/QLoRA on medical reasoning SFT data
  • Primary language: English
  • Intended tasks: clinical vignette reasoning, justification, treatment option comparison, dosage math (simple)
  • Not for: real-world diagnosis or treatment; no protected-health-information handling

βœ… Intended Use

  • Educational exploration of clinical reasoning
  • Generating explanations for practice questions
  • Drafting step-by-step rationales for study aids

Out-of-scope / High-risk
Do not use for clinical decisions, triage, or patient-specific recommendations.


πŸ“š Data

  • Training set: FreedomIntelligence/medical-o1-reasoning-SFT (instruction-style, reasoning-focused)
  • Preprocessing:
    • Deduplication & light cleaning
    • Prompt formatting into {system, instruction, input} style
    • Truncation/padding to max sequence length (πŸ”§ L=1024 by default)

πŸ› οΈ Training Procedure

Strategy

  • Method: LoRA / QLoRA (4-bit NF4) to fit small GPUs while retaining quality
  • Reasoning: adapters reduce trainable params β†’ faster, cheaper, less overfitting risk

Environment (example)

  • GPU: πŸ”§ 1Γ— NVIDIA RTX 4060 (8 GB) / L4 / A10G
  • Frameworks: transformers, peft, bitsandbytes, accelerate
  • Precision: bf16 (fallback fp16 if needed)
  • Attention: Flash-Attention 2 when available

Key Hyperparameters (fill if you used different)

  • LoRA config: r=8, alpha=16, lora_dropout=0.1, target_modules=["q_proj","v_proj","o_proj"]
  • Quantization (QLoRA): load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True
  • Optimizer: AdamW8bit (bitsandbytes) or adamw_torch_fused
  • LR & schedule: lr=2e-4, cosine decay, warmup_ratio=0.06
  • Regularization: weight decay 0.01, label smoothing 0.05, gradient clipping 1.0
  • Batching: per_device_train_batch_size=4, gradient_accumulation_steps=8 β†’ effective batch 32
  • Sequence length: πŸ”§ max_seq_length=1024
  • Epochs: πŸ”§ 2–3 (use early stopping on val loss)
  • Gradient checkpointing: enabled (reduce activation memory)
  • Eval/Save: every πŸ”§ 500 steps, keep best on val loss
  • Checkpoint offload: local recent N + sync to S3 (callback)

Reproducibility

  • seed=42
  • Deterministic ops where practical (note: may reduce throughput)

⏱️ Training Time & Throughput (example on RTX 4060)

  • Throughput (train): ~15k tokens/s (depends on L, micro-batch, kernels)
  • Steps/epoch: ceil(num_samples / effective_batch)
    e.g., πŸ”§ 100k / 32 β‰ˆ 3125
  • Epoch time (L=1024): ~1.9–2.3 h (incl. overhead)
    Your numbers will vary; measure 200 steps after warmup for accuracy.

πŸ§ͺ Evaluation

  • Metrics: perplexity (val), exact match / ROUGE-L on held-out prompts
  • Human spot-checks: correctness of differentials, contraindications, and justification clarity
  • Known failure modes:
    • Confident but incorrect rationale (hallucinations)
    • Outdated guidelines
    • Arithmetic slips on edge cases

Add your concrete scores here once computed.

  • Val perplexity: πŸ”§ …
  • Exact match / ROUGE-L: πŸ”§ …

πŸ” Safety & Ethics

  • The model may produce incorrect or harmful medical content.
  • No PHI was used; training data are public/anon sources.
  • Add a disclaimer in any downstream app; keep a human-in-the-loop.

πŸš€ How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

repo = "YOUR_ORG/medical-qwen25-0_5b-lora"  # replace with your repo
tok = AutoTokenizer.from_pretrained(repo, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    repo,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = """You are a clinician. Reason step by step.
Patient: 56F with chest pain radiating to jaw, diaphoresis...
Question: Most likely diagnosis and initial management?"""
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=256, temperature=0.2, top_p=0.9)
print(tok.decode(out[0], skip_special_tokens=True))




#Training Script snippet with lora Fine Tunning
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer
from peft import LoraConfig, get_peft_model
import torch

base = "Qwen/Qwen2.5-0.5B-Instruct"
tok = AutoTokenizer.from_pretrained(base)

model = AutoModelForCausalLM.from_pretrained(
    base,
    load_in_4bit=True,
    torch_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    device_map="auto",
    low_cpu_mem_usage=True,
)

lora = LoraConfig(r=8, lora_alpha=16, lora_dropout=0.1,
                  target_modules=["q_proj","v_proj","o_proj"])
model = get_peft_model(model, lora)
model.gradient_checkpointing_enable()
model.config.use_cache = False  # training only

args = TrainingArguments(
    output_dir="ckpts",
    per_device_train_batch_size=4,
    gradient_accumulation_steps=8,
    learning_rate=2e-4,
    lr_scheduler_type="cosine",
    warmup_ratio=0.06,
    weight_decay=0.01,
    num_train_epochs=3,
    bf16=True,
    save_steps=500,
    evaluation_strategy="steps",
    eval_steps=500,
    logging_steps=50,
    save_safetensors=True,
    load_best_model_at_end=True,
    metric_for_best_model="loss",
    gradient_checkpointing=True,
)
# ... Trainer(train_ds, val_ds, callbacks for S3 sync)
Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Rutvik141/medical-reasoning-qwen-0.5b

Base model

Qwen/Qwen2.5-0.5B
Adapter
(318)
this model

Dataset used to train Rutvik141/medical-reasoning-qwen-0.5b