Qwen2.5-3B User Turn Prediction (QLoRA Fine-tuned)

Model Description

This model is a QLoRA fine-tuned version of Qwen/Qwen2.5-3B-Instruct specifically trained for user turn prediction in multi-turn dialogues. Unlike traditional dialogue systems that predict assistant responses, this model predicts the next user utterance given conversation context.

Key Innovation: Inverses the traditional dialogue modeling task by focusing on user behavior prediction rather than system response generation.

Model Details

Base Model: Qwen2.5-3B-Instruct (3B parameters)
Fine-tuning Method: QLoRA (Quantized Low-Rank Adaptation)
Quantization: 4-bit NF4 with double quantization
Training Examples: 800 conversation pairs
Evaluation Examples: 40 conversation pairs
Domains: Open-domain (WildChat) + Task-oriented (Schema-Guided Dialogue)

Visual Performance Analysis

Relative Performance Improvements by Domain

Figure 1: Relative performance changes across metrics and dialogue domains

Baseline Configuration Comparison

Figure 2: Comparison of fine-tuned model against different baseline configurations

Usage

Installation

pip install transformers peft torch accelerate bitsandbytes

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16
)

# Load base model with 4-bit quantization
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-3B-Instruct",
    device_map="auto",
    quantization_config=bnb_config,
    dtype=torch.float16,
)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, "path/to/qwen_userturn_lora")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")

# Prepare conversation context
conversation = [
    {"role": "user", "content": "I'm looking for a restaurant in downtown"},
    {"role": "assistant", "content": "What type of cuisine would you prefer?"}
]

inputs = tokenizer.apply_chat_template(
    conversation,
    return_tensors="pt",
    tokenize=True,
    add_generation_prompt=False
).to(model.device)

user_open_tokens = tokenizer.encode("<|im_start|>user\n", add_special_tokens=False, return_tensors="pt").to(model.device)

# Directly concatenate the tensors
input_ids = torch.cat([inputs, user_open_tokens], dim=-1)
attention_mask = torch.ones_like(input_ids)
input_len = int(input_ids.shape[1])

bad = tokenizer(
    ["<|im_start|>assistant", "<|im_start|>system", "<|im_start|>user"],
    add_special_tokens=False, return_tensors="pt"
)["input_ids"].tolist()
logits_processors = LogitsProcessorList([NoBadWordsLogitsProcessor(bad, eos_token_id=tokenizer.eos_token_id)])

# Generate prediction
with torch.no_grad():
    outputs = model.generate(
        input_ids,
        max_new_tokens=128,
        do_sample=True,
        temperature=0.4,
        top_p=0.9,
        attention_mask=attention_mask,
        logits_processor=logits_processors
    )

predicted_user_turn = tokenizer.decode(
    outputs[0][input_len:],
    skip_special_tokens=True
)

print(f"Predicted user turn: {predicted_user_turn}")

Training Details

Dataset

Training Set (800 examples):

400 from WildChat-1M (open-domain)
400 from Schema-Guided Dialogue (task-oriented)

Evaluation Set (40 examples):

20 from WildChat-1M
20 from Schema-Guided Dialogue

Selection Criteria:

Minimum 2 turns per conversation
English language only (WildChat)
Valid assistant-user turn pairs

Training Configuration

# QLoRA Configuration
LoRA Rank: 16
LoRA Alpha: 32
LoRA Dropout: 0.01
Target Modules: [
    "q_proj", "k_proj", "v_proj", "o_proj",
    "gate_proj", "up_proj", "down_proj"
]

# Quantization
Load in 4-bit: True
BnB 4-bit Compute Dtype: float16
BnB 4-bit Quant Type: nf4
BnB 4-bit Use Double Quant: True

Evaluation Methodology

Metrics

BERTScore-F1: Semantic similarity using contextualized embeddings
BLEURT: Learned metric trained on human judgments
Perplexity: Model confidence

Intended Use

Primary Use Cases

✅ Research on dialogue systems and user behavior modeling
✅ User simulation for dialogue system evaluation
✅ Conversational AI analysis and understanding
✅ Synthetic dialogue generation for training data augmentation
✅ User intent prediction in multi-turn contexts

Out-of-Scope Use Cases

❌ Production deployment without safety guardrails
❌ Real-time user profiling or surveillance
❌ Generating harmful or manipulative content
❌ Non-English dialogue prediction (untested)

Citation

If you use this model in your research, please cite:

@bachelorsthesis{sebastianboehler2025userturn,
  title={To what extent can open-source Large Language Models predict the next user turn in multi-turn dialogues across open-domain and task-oriented settings?},
  author={Sebastian Boehler},
  school={IU International University of Applied Sciences},
  year={2025},
  type={Bachelor's Thesis},
  note={Model: qwen2.5-3b-dialogue-userturn-lora}
}

Base Model Citation

@article{qwen2.5,
  title={Qwen2.5: A Party of Foundation Models},
  author={Qwen Team},
  journal={arXiv preprint},
  year={2024}
}

Model Card Authors

Sebastian Boehler - IU International University of Applied Sciences

Downloads last month: -

Model tree for sebastianboehler/qwen2.5-3b-dialogue-userturn-lora

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Adapter

(594)

this model

sebastianboehler
/

qwen2.5-3b-dialogue-userturn-lora