Qwen2.5-3B User Turn Prediction (QLoRA Fine-tuned)
Model Description
This model is a QLoRA fine-tuned version of Qwen/Qwen2.5-3B-Instruct specifically trained for user turn prediction in multi-turn dialogues. Unlike traditional dialogue systems that predict assistant responses, this model predicts the next user utterance given conversation context.
Key Innovation: Inverses the traditional dialogue modeling task by focusing on user behavior prediction rather than system response generation.
Model Details
- Base Model: Qwen2.5-3B-Instruct (3B parameters)
- Fine-tuning Method: QLoRA (Quantized Low-Rank Adaptation)
- Quantization: 4-bit NF4 with double quantization
- Training Examples: 800 conversation pairs
- Evaluation Examples: 40 conversation pairs
- Domains: Open-domain (WildChat) + Task-oriented (Schema-Guided Dialogue)
Visual Performance Analysis
Relative Performance Improvements by Domain
Figure 1: Relative performance changes across metrics and dialogue domains
Baseline Configuration Comparison
Figure 2: Comparison of fine-tuned model against different baseline configurations
Usage
Installation
pip install transformers peft torch accelerate bitsandbytes
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.float16
)
# Load base model with 4-bit quantization
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-3B-Instruct",
device_map="auto",
quantization_config=bnb_config,
dtype=torch.float16,
)
# Load LoRA adapter
model = PeftModel.from_pretrained(model, "path/to/qwen_userturn_lora")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")
# Prepare conversation context
conversation = [
{"role": "user", "content": "I'm looking for a restaurant in downtown"},
{"role": "assistant", "content": "What type of cuisine would you prefer?"}
]
inputs = tokenizer.apply_chat_template(
conversation,
return_tensors="pt",
tokenize=True,
add_generation_prompt=False
).to(model.device)
user_open_tokens = tokenizer.encode("<|im_start|>user\n", add_special_tokens=False, return_tensors="pt").to(model.device)
# Directly concatenate the tensors
input_ids = torch.cat([inputs, user_open_tokens], dim=-1)
attention_mask = torch.ones_like(input_ids)
input_len = int(input_ids.shape[1])
bad = tokenizer(
["<|im_start|>assistant", "<|im_start|>system", "<|im_start|>user"],
add_special_tokens=False, return_tensors="pt"
)["input_ids"].tolist()
logits_processors = LogitsProcessorList([NoBadWordsLogitsProcessor(bad, eos_token_id=tokenizer.eos_token_id)])
# Generate prediction
with torch.no_grad():
outputs = model.generate(
input_ids,
max_new_tokens=128,
do_sample=True,
temperature=0.4,
top_p=0.9,
attention_mask=attention_mask,
logits_processor=logits_processors
)
predicted_user_turn = tokenizer.decode(
outputs[0][input_len:],
skip_special_tokens=True
)
print(f"Predicted user turn: {predicted_user_turn}")
Training Details
Dataset
Training Set (800 examples):
- 400 from WildChat-1M (open-domain)
- 400 from Schema-Guided Dialogue (task-oriented)
Evaluation Set (40 examples):
- 20 from WildChat-1M
- 20 from Schema-Guided Dialogue
Selection Criteria:
- Minimum 2 turns per conversation
- English language only (WildChat)
- Valid assistant-user turn pairs
Training Configuration
# QLoRA Configuration
LoRA Rank: 16
LoRA Alpha: 32
LoRA Dropout: 0.01
Target Modules: [
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"
]
# Quantization
Load in 4-bit: True
BnB 4-bit Compute Dtype: float16
BnB 4-bit Quant Type: nf4
BnB 4-bit Use Double Quant: True
Evaluation Methodology
Metrics
- BERTScore-F1: Semantic similarity using contextualized embeddings
- BLEURT: Learned metric trained on human judgments
- Perplexity: Model confidence
Intended Use
Primary Use Cases
β
Research on dialogue systems and user behavior modeling
β
User simulation for dialogue system evaluation
β
Conversational AI analysis and understanding
β
Synthetic dialogue generation for training data augmentation
β
User intent prediction in multi-turn contexts
Out-of-Scope Use Cases
β Production deployment without safety guardrails
β Real-time user profiling or surveillance
β Generating harmful or manipulative content
β Non-English dialogue prediction (untested)
Citation
If you use this model in your research, please cite:
@bachelorsthesis{sebastianboehler2025userturn,
title={To what extent can open-source Large Language Models predict the next user turn in multi-turn dialogues across open-domain and task-oriented settings?},
author={Sebastian Boehler},
school={IU International University of Applied Sciences},
year={2025},
type={Bachelor's Thesis},
note={Model: qwen2.5-3b-dialogue-userturn-lora}
}
Base Model Citation
@article{qwen2.5,
title={Qwen2.5: A Party of Foundation Models},
author={Qwen Team},
journal={arXiv preprint},
year={2024}
}
Model Card Authors
Sebastian Boehler - IU International University of Applied Sciences
- Downloads last month
- -

