Qwen-1.8B-Chat LoRA for Hemiplegia/Stroke Q&A (Associated with Tiansuan AI)

This repository contains LoRA (Low-Rank Adaptation) weights for the Qwen/Qwen-1_8B-Chat model. This model was fine-tuned on a small, custom dataset to answer questions related to hemiplegia, cerebral thrombosis (stroke), and related conditions. This fine-tuning experiment is associated with work at Tiansuan AI.

Model Description

This is a LoRA adapter. To use it, you need to load the base model Qwen/Qwen-1_8B-Chat and then apply these LoRA weights using the PEFT library.

Fine-tuning Data

The model was fine-tuned on a very small, custom dataset consisting of 5 question-answer pairs specifically designed for the medical domain of hemiplegia and stroke. The training process involved 20 steps. Due to the extremely limited dataset size and training duration, the model's capabilities are primarily for demonstration of the fine-tuning process. It will likely exhibit strong memorization of the training data and limited generalization to unseen questions.

Intended Use

This model is intended for research, educational, and illustrative purposes to demonstrate the LoRA fine-tuning technique for LLMs on specialized, albeit small, datasets. Crucially, this model is NOT a substitute for professional medical advice, diagnosis, or treatment. Always seek the advice of your physician or other qualified health provider with any questions you may have regarding a medical condition. The outputs of this model should be critically evaluated and not used for any real-world medical decision-making.

How to Use with PEFT

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

# Define model IDs
base_model_id = "Qwen/Qwen-1_8B-Chat"
lora_adapter_id = "jinv2/qwen-1_8b-hemiplegia-lora" # This is your model

# Setup quantization configuration (as used during fine-tuning)
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16 # Matches the successful fine-tuning compute dtype
)

# Load the base model with quantization
print(f"Loading base model: {base_model_id}...")
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    quantization_config=quantization_config,
    trust_remote_code=True,
    device_map="auto" # Automatically distribute model on available anjing (GPU if available, else CPU)
)
print("Base model loaded.")

# Load the tokenizer
# It's good practice to load tokenizer from the same source as the fine-tuned adapter if uploaded,
# or ensure base tokenizer settings (pad_token, etc.) are consistent.
print(f"Loading tokenizer from: {lora_adapter_id} (or fallback to {base_model_id})...")
try:
    tokenizer = AutoTokenizer.from_pretrained(lora_adapter_id, trust_remote_code=True)
    print(f"Successfully loaded tokenizer from {lora_adapter_id}.")
except Exception:
    print(f"Could not load tokenizer from {lora_adapter_id}, falling back to {base_model_id}.")
    tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)

# Set pad_token if not already set (important for Qwen and generation)
if tokenizer.pad_token_id is None:
    if tokenizer.eos_token_id is not None:
        tokenizer.pad_token_id = tokenizer.eos_token_id
        print(f"Set tokenizer.pad_token_id to eos_token_id: {tokenizer.pad_token_id}")
    else:
        # Fallback if eos_token_id is also None (should not happen for Qwen)
        # For Qwen, eos_token_id is typically around 151643 for <|endoftext|>
        # tokenizer.pad_token_id = 151643 # Example, verify Qwen's actual eos_token_id
        print("Warning: pad_token_id and eos_token_id are None. Generation might be problematic.")
tokenizer.padding_side = "left" # Usually preferred for generation

# Load the LoRA adapter onto the base model
print(f"Loading LoRA adapter: {lora_adapter_id}...")
model = PeftModel.from_pretrained(base_model, lora_adapter_id)
model.eval() # Set the model to evaluation mode
print("LoRA adapter loaded and model is ready for inference.")

# --- Inference Example ---
# Since tokenizer.chat_template was 'Not Available' during Colab run,
# we manually construct the prompt according to Qwen's ChatML format.
system_prompt_content = "你是一个专注于偏瘫、脑血栓、半身不遂领域的医疗问答助手。"
user_query_content = "偏瘫患者的早期康复锻炼有哪些？" # A question from your training set

prompt = f"<|im_start|>system\n{system_prompt_content}<|im_end|>\n<|im_start|>user\n{user_query_content}<|im_end|>\n<|im_start|>assistant\n"

print(f"\nFormatted Prompt:\n{prompt}")

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate response
print("Generating response...")
with torch.no_grad(): # Inference doesn't need gradient calculation
    outputs = model.generate(
        **inputs,
        max_new_tokens=150,
        pad_token_id=tokenizer.pad_token_id, # Crucial for generation to avoid warnings/errors
        eos_token_id=[tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|im_end|>")] if tokenizer.eos_token_id is not None else None, # Qwen specific EOS handling
        temperature=0.7,
        top_p=0.9,
        do_sample=True
    )

# Decode and print the response
# We need to slice the output to get only the generated part, excluding the input prompt
response_text = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(f"\nModel Response:\n{response_text.strip()}")

# Example with a new question
user_query_new = "中风后如何进行语言恢复训练？"
prompt_new = f"<|im_start|>system\n{system_prompt_content}<|im_end|>\n<|im_start|>user\n{user_query_new}<|im_end|>\n<|im_start|>assistant\n"
inputs_new = tokenizer(prompt_new, return_tensors="pt").to(model.device)
print("\nGenerating response for a new question...")
with torch.no_grad():
    outputs_new = model.generate(
        **inputs_new,
        max_new_tokens=200,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=[tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|im_end|>")] if tokenizer.eos_token_id is not None else None,
        temperature=0.7,
        top_p=0.9,
        do_sample=True
    )
response_text_new = tokenizer.decode(outputs_new[0][inputs_new.input_ids.shape[1]:], skip_special_tokens=True)
print(f"\nModel Response (New Question):\n{response_text_new.strip()}")

License and Attribution

The LoRA adapter weights and this model card are made available under the Apache 2.0 License. Please see the LICENSE file if included, or refer to Apache 2.0 License details. The base model Qwen/Qwen-1_8B-Chat is subject to the Tongyi Qianwen LICENSE AGREEMENT.

This fine-tuning experiment is associated with Tiansuan AI. For more information, you can visit https://jinv2.github.io.

Disclaimer

The information provided by this model is for general informational and demonstrative purposes only, and does not constitute medical advice. Always seek the advice of a qualified health professional for any medical concerns. The outputs of this model are based on a very small dataset and should be critically evaluated. ```

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for jinv2/qwen-1_8b-hemiplegia-lora

Base model

Qwen/Qwen-1_8B-Chat

Adapter

(6)

this model

jinv2
/

qwen-1_8b-hemiplegia-lora