GPT-5-Distill-Qwen3-4B-Instruct-2507

Model Type: Instruction-tuned conversational LLM
Supports LoRA adapters and full-finetuned models for inference

  • Base Model: Qwen/Qwen3-4B-Instruct-2507
  • Parameters: 4B
  • Training Method:
    • Supervised Fine-Tuning (SFT) on ShareGPT data
    • Knowledge distillation from LMSYS GPT-5 responses
  • Supported Languages: Chinese, English, mixed inputs/outputs
  • Max Context Length: Up to 32K tokens (max_seq_length = 32768)

This model is trained on ShareGPT-Qwen3 instruction datasets and distilled toward the conversational style and quality of GPT-5. It aims to achieve high-quality, natural-sounding dialogues with low computational overhead—perfect for lightweight applications without sacrificing responsiveness.


2. Intended Use Cases

✅ Recommended:

  • Casual chat in Chinese/English
  • General knowledge explanations & reasoning guidance
  • Code suggestions and simple debugging tips
  • Writing assistance: editing, summarizing, rewriting
  • Role-playing conversations (with well-designed prompts)

⚠️ Not Suitable For:

  • High-risk decision-making:
    • Medical diagnosis, mental health support
    • Legal advice, financial investment recommendations
  • Real-time factual tasks (e.g., news, stock updates)
  • Authoritative judgment on sensitive topics

Note: Outputs are for reference only and not intended as the sole basis for critical decisions.


3. Training Data & Distillation Process

Key Datasets:

(1) ds1: ShareGPT-Qwen3 Instruction Dataset

  • Source: Jackrong/ShareGPT-Qwen3-235B-A22B-Instuct-2507
  • Purpose:
    • Provides diverse instruction-response pairs
    • Supports multi-turn dialogues and context awareness
  • Processing:
    • Cleaned for quality and relevance
    • Standardized into instruction, input, output format

(2) ds2: LMSYS GPT-5 Teacher Response Data

  • Source: ytz20/LMSYS-Chat-GPT-5-Chat-Response
  • Filtering:
    • Only kept samples with flaw == "normal"
    • Removed hallucinations and inconsistent responses
  • Purpose:
    • Distillation target for conversational quality
    • Enhances clarity, coherence, and fluency

Training Flow:

  1. Prepare unified Chat-formatted dataset
  2. Fine-tune base Qwen3-4B-Instruct-2507 via SFT
  3. Conduct knowledge distillation using GPT-5's normal responses as teacher outputs
  4. Balance style imitation with semantic fidelity to ensure robustness

⚖️ Note: This work is based on publicly available, non-sensitive datasets and uses them responsibly under fair use principles.


4. Key Features Summary

Feature Description
Lightweight ~4B parameter model – fast inference, low resource usage
Distillation-Style Responses Mimics GPT-5’s conversational fluency and helpfulness
Highly Conversational Excellent for chatbot-style interactions with rich dialogue flow
Multilingual Ready Seamless support for Chinese and English

5. Acknowledgements

We thank:

  • LMSYS team for sharing GPT-5 response data
  • Jackrong for the ShareGPT-Qwen3 dataset
  • Qwen team for releasing Qwen3-4B-Instruct

This project is an open research effort aimed at making high-quality conversational AI accessible with smaller models.


Downloads last month
1,075
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to view the estimation

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jackrong/GPT-5-Distill-Qwen3-4B-Instruct-GGUF

Quantized
(128)
this model

Dataset used to train Jackrong/GPT-5-Distill-Qwen3-4B-Instruct-GGUF