GPT-5-Distill-Qwen3-4B-Instruct-2507

Model Type: Instruction-tuned conversational LLM
Supports LoRA adapters and full-finetuned models for inference

Base Model: Qwen/Qwen3-4B-Instruct-2507
Parameters: 4B
Training Method:
- Supervised Fine-Tuning (SFT) on ShareGPT data
- Knowledge distillation from LMSYS GPT-5 responses
Supported Languages: Chinese, English, mixed inputs/outputs
Max Context Length: Up to 32K tokens (max_seq_length = 32768)

This model is trained on ShareGPT-Qwen3 instruction datasets and distilled toward the conversational style and quality of GPT-5. It aims to achieve high-quality, natural-sounding dialogues with low computational overhead—perfect for lightweight applications without sacrificing responsiveness.

2. Intended Use Cases

✅ Recommended:

Casual chat in Chinese/English
General knowledge explanations & reasoning guidance
Code suggestions and simple debugging tips
Writing assistance: editing, summarizing, rewriting
Role-playing conversations (with well-designed prompts)

⚠️ Not Suitable For:

High-risk decision-making:
- Medical diagnosis, mental health support
- Legal advice, financial investment recommendations
Real-time factual tasks (e.g., news, stock updates)
Authoritative judgment on sensitive topics

Note: Outputs are for reference only and not intended as the sole basis for critical decisions.

3. Training Data & Distillation Process

Key Datasets:

(1) ds1: ShareGPT-Qwen3 Instruction Dataset

Source: Jackrong/ShareGPT-Qwen3-235B-A22B-Instuct-2507
Purpose:
- Provides diverse instruction-response pairs
- Supports multi-turn dialogues and context awareness
Processing:
- Cleaned for quality and relevance
- Standardized into instruction, input, output format

(2) ds2: LMSYS GPT-5 Teacher Response Data

Source: ytz20/LMSYS-Chat-GPT-5-Chat-Response
Filtering:
- Only kept samples with flaw == "normal"
- Removed hallucinations and inconsistent responses
Purpose:
- Distillation target for conversational quality
- Enhances clarity, coherence, and fluency

Training Flow:

Prepare unified Chat-formatted dataset
Fine-tune base Qwen3-4B-Instruct-2507 via SFT
Conduct knowledge distillation using GPT-5's normal responses as teacher outputs
Balance style imitation with semantic fidelity to ensure robustness

⚖️ Note: This work is based on publicly available, non-sensitive datasets and uses them responsibly under fair use principles.

4. Key Features Summary

Feature	Description
Lightweight	~4B parameter model – fast inference, low resource usage
Distillation-Style Responses	Mimics GPT-5’s conversational fluency and helpfulness
Highly Conversational	Excellent for chatbot-style interactions with rich dialogue flow
Multilingual Ready	Seamless support for Chinese and English

5. Acknowledgements

We thank:

LMSYS team for sharing GPT-5 response data
Jackrong for the ShareGPT-Qwen3 dataset
Qwen team for releasing Qwen3-4B-Instruct

This project is an open research effort aimed at making high-quality conversational AI accessible with smaller models.

Downloads last month: 1,075

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

4-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jackrong/GPT-5-Distill-Qwen3-4B-Instruct-GGUF

Base model

Qwen/Qwen3-4B-Instruct-2507

Quantized

(128)

this model

Jackrong
/

GPT-5-Distill-Qwen3-4B-Instruct-GGUF