Qwen3-1.7B SFT Model
Model Description
This is a fine-tuned version of Qwen3-1.7B using Supervised Fine-Tuning (SFT) with FSDP (Fully Sharded Data Parallel) + QLoRA (Quantized Low-Rank Adaptation) techniques.
Training Details
Base Model
- Model: Qwen/Qwen3-1.7B
- Architecture: Transformer-based causal language model
- Parameters: 1.7 billion
Training Configuration
- Method: FSDP + QLoRA
- Quantization: 4-bit QLoRA
- LoRA Parameters:
- r: 64
- alpha: 16
- dropout: 0.1
- target: linear layers
- Hardware: 8x H100 80GB HBM3
- Precision: bfloat16
- Flash Attention: Enabled
Training Hyperparameters
- Epochs: 1
- Micro Batch Size: 1
- Gradient Accumulation Steps: 16
- Learning Rate: 1e-4
- Scheduler: Cosine with warmup
- Warmup Ratio: 0.03
- Optimizer: AdamW
- Sequence Length: 1024
Dataset
- Custom SFT dataset (SFT_004_origin_4.parquet)
- Validation split: 10%
- Sample packing enabled for training efficiency
Model Performance
The model has been trained for efficient instruction following and maintains the original Qwen3 capabilities while being optimized for custom tasks.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"u-10bei/qwen3-1.7b-sft-merged",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
"u-10bei/qwen3-1.7b-sft-merged",
trust_remote_code=True
)
# Chat format
messages = [
{"role": "user", "content": "Hello! How can I help you today?"}
]
# Format conversation
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
# Tokenize
inputs = tokenizer(text, return_tensors="pt")
# Generate
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=256,
do_sample=True,
temperature=0.7,
top_p=0.9,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id
)
# Decode response
response = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
print(response)
Direct Chat Format
# Manual chat formatting
prompt = "<|im_start|>user\nHello! How are you?<|im_end|>\n<|im_start|>assistant\n"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=100,
do_sample=True,
temperature=0.7,
eos_token_id=tokenizer.convert_tokens_to_ids("<|im_end|>")
)
response = tokenizer.decode(outputs[0], skip_special_tokens=False)
print(response)
Special Tokens
- BOS Token:
<|im_start|> - EOS Token:
<|im_end|> - UNK Token:
<|endoftext|> - PAD Token:
<|endoftext|>
Technical Specifications
Model Architecture
- Attention: Flash Attention 2 (training and inference)
- Precision: bfloat16 native support
- Context Length: 1024 tokens (training), extensible for inference
- Vocabulary Size: 151,669 tokens
Optimization Features
- Memory Efficient: FSDP sharding reduces memory footprint
- Quantization Ready: QLoRA-compatible for efficient fine-tuning
- Multi-GPU: Optimized for distributed inference
Training Infrastructure
- Distributed Training: FSDP (Fully Sharded Data Parallel)
- Communication: NCCL with Ethernet backend
- Memory Management: Expandable segments, optimized allocation
- Monitoring: Weights & Biases integration
Limitations
- This model is optimized for the specific training dataset and may not generalize to all use cases
- Context length is limited to 1024 tokens during training
- Performance may vary depending on the specific task and input format
Ethical Considerations
This model inherits the capabilities and limitations of the base Qwen3-1.7B model. Users should be aware of potential biases and use the model responsibly.
Citation
If you use this model, please cite:
@model{qwen3-1.7b-sft-merged,
title={Qwen3-1.7B SFT Model with FSDP+QLoRA},
author={u-10bei},
year={2025},
url={https://huggingface.co/u-10bei/qwen3-1.7b-sft-merged}
}
Model Card Authors
- u-10bei
Training Date
August 2025
This model was trained using advanced distributed training techniques (FSDP + QLoRA) on high-performance H100 hardware for optimal efficiency and scalability.
- Downloads last month
- 2