Qwen2.5-3B User Turn Prediction (QLoRA Fine-tuned)

Model Description

This model is a QLoRA fine-tuned version of Qwen/Qwen2.5-3B-Instruct specifically trained for user turn prediction in multi-turn dialogues. Unlike traditional dialogue systems that predict assistant responses, this model predicts the next user utterance given conversation context.

Key Innovation: Inverses the traditional dialogue modeling task by focusing on user behavior prediction rather than system response generation.

Model Details

  • Base Model: Qwen2.5-3B-Instruct (3B parameters)
  • Fine-tuning Method: QLoRA (Quantized Low-Rank Adaptation)
  • Quantization: 4-bit NF4 with double quantization
  • Training Examples: 800 conversation pairs
  • Evaluation Examples: 40 conversation pairs
  • Domains: Open-domain (WildChat) + Task-oriented (Schema-Guided Dialogue)

Visual Performance Analysis

Relative Performance Improvements by Domain

Performance Improvements

Figure 1: Relative performance changes across metrics and dialogue domains

Baseline Configuration Comparison

Baseline Comparison

Figure 2: Comparison of fine-tuned model against different baseline configurations

Usage

Installation

pip install transformers peft torch accelerate bitsandbytes

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16
)

# Load base model with 4-bit quantization
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-3B-Instruct",
    device_map="auto",
    quantization_config=bnb_config,
    dtype=torch.float16,
)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, "path/to/qwen_userturn_lora")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")

# Prepare conversation context
conversation = [
    {"role": "user", "content": "I'm looking for a restaurant in downtown"},
    {"role": "assistant", "content": "What type of cuisine would you prefer?"}
]

inputs = tokenizer.apply_chat_template(
    conversation,
    return_tensors="pt",
    tokenize=True,
    add_generation_prompt=False
).to(model.device)

user_open_tokens = tokenizer.encode("<|im_start|>user\n", add_special_tokens=False, return_tensors="pt").to(model.device)

# Directly concatenate the tensors
input_ids = torch.cat([inputs, user_open_tokens], dim=-1)
attention_mask = torch.ones_like(input_ids)
input_len = int(input_ids.shape[1])

bad = tokenizer(
    ["<|im_start|>assistant", "<|im_start|>system", "<|im_start|>user"],
    add_special_tokens=False, return_tensors="pt"
)["input_ids"].tolist()
logits_processors = LogitsProcessorList([NoBadWordsLogitsProcessor(bad, eos_token_id=tokenizer.eos_token_id)])

# Generate prediction
with torch.no_grad():
    outputs = model.generate(
        input_ids,
        max_new_tokens=128,
        do_sample=True,
        temperature=0.4,
        top_p=0.9,
        attention_mask=attention_mask,
        logits_processor=logits_processors
    )

predicted_user_turn = tokenizer.decode(
    outputs[0][input_len:],
    skip_special_tokens=True
)

print(f"Predicted user turn: {predicted_user_turn}")

Training Details

Dataset

Training Set (800 examples):

Evaluation Set (40 examples):

  • 20 from WildChat-1M
  • 20 from Schema-Guided Dialogue

Selection Criteria:

  • Minimum 2 turns per conversation
  • English language only (WildChat)
  • Valid assistant-user turn pairs

Training Configuration

# QLoRA Configuration
LoRA Rank: 16
LoRA Alpha: 32
LoRA Dropout: 0.01
Target Modules: [
    "q_proj", "k_proj", "v_proj", "o_proj",
    "gate_proj", "up_proj", "down_proj"
]

# Quantization
Load in 4-bit: True
BnB 4-bit Compute Dtype: float16
BnB 4-bit Quant Type: nf4
BnB 4-bit Use Double Quant: True

Evaluation Methodology

Metrics

  1. BERTScore-F1: Semantic similarity using contextualized embeddings
  2. BLEURT: Learned metric trained on human judgments
  3. Perplexity: Model confidence

Intended Use

Primary Use Cases

βœ… Research on dialogue systems and user behavior modeling
βœ… User simulation for dialogue system evaluation
βœ… Conversational AI analysis and understanding
βœ… Synthetic dialogue generation for training data augmentation
βœ… User intent prediction in multi-turn contexts

Out-of-Scope Use Cases

❌ Production deployment without safety guardrails
❌ Real-time user profiling or surveillance
❌ Generating harmful or manipulative content
❌ Non-English dialogue prediction (untested)

Citation

If you use this model in your research, please cite:

@bachelorsthesis{sebastianboehler2025userturn,
  title={To what extent can open-source Large Language Models predict the next user turn in multi-turn dialogues across open-domain and task-oriented settings?},
  author={Sebastian Boehler},
  school={IU International University of Applied Sciences},
  year={2025},
  type={Bachelor's Thesis},
  note={Model: qwen2.5-3b-dialogue-userturn-lora}
}

Base Model Citation

@article{qwen2.5,
  title={Qwen2.5: A Party of Foundation Models},
  author={Qwen Team},
  journal={arXiv preprint},
  year={2024}
}

Model Card Authors

Sebastian Boehler - IU International University of Applied Sciences

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for sebastianboehler/qwen2.5-3b-dialogue-userturn-lora

Base model

Qwen/Qwen2.5-3B
Adapter
(594)
this model

Datasets used to train sebastianboehler/qwen2.5-3b-dialogue-userturn-lora