Qwen3-1.7B SFT Model

Model Description

This is a fine-tuned version of Qwen3-1.7B using Supervised Fine-Tuning (SFT) with FSDP (Fully Sharded Data Parallel) + QLoRA (Quantized Low-Rank Adaptation) techniques.

Training Details

Base Model

  • Model: Qwen/Qwen3-1.7B
  • Architecture: Transformer-based causal language model
  • Parameters: 1.7 billion

Training Configuration

  • Method: FSDP + QLoRA
  • Quantization: 4-bit QLoRA
  • LoRA Parameters:
    • r: 64
    • alpha: 16
    • dropout: 0.1
    • target: linear layers
  • Hardware: 8x H100 80GB HBM3
  • Precision: bfloat16
  • Flash Attention: Enabled

Training Hyperparameters

  • Epochs: 1
  • Micro Batch Size: 1
  • Gradient Accumulation Steps: 16
  • Learning Rate: 1e-4
  • Scheduler: Cosine with warmup
  • Warmup Ratio: 0.03
  • Optimizer: AdamW
  • Sequence Length: 1024

Dataset

  • Custom SFT dataset (SFT_004_origin_4.parquet)
  • Validation split: 10%
  • Sample packing enabled for training efficiency

Model Performance

The model has been trained for efficient instruction following and maintains the original Qwen3 capabilities while being optimized for custom tasks.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "u-10bei/qwen3-1.7b-sft-merged",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    "u-10bei/qwen3-1.7b-sft-merged",
    trust_remote_code=True
)

# Chat format
messages = [
    {"role": "user", "content": "Hello! How can I help you today?"}
]

# Format conversation
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# Tokenize
inputs = tokenizer(text, return_tensors="pt")

# Generate
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id
    )

# Decode response
response = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
print(response)

Direct Chat Format

# Manual chat formatting
prompt = "<|im_start|>user\nHello! How are you?<|im_end|>\n<|im_start|>assistant\n"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    do_sample=True,
    temperature=0.7,
    eos_token_id=tokenizer.convert_tokens_to_ids("<|im_end|>")
)

response = tokenizer.decode(outputs[0], skip_special_tokens=False)
print(response)

Special Tokens

  • BOS Token: <|im_start|>
  • EOS Token: <|im_end|>
  • UNK Token: <|endoftext|>
  • PAD Token: <|endoftext|>

Technical Specifications

Model Architecture

  • Attention: Flash Attention 2 (training and inference)
  • Precision: bfloat16 native support
  • Context Length: 1024 tokens (training), extensible for inference
  • Vocabulary Size: 151,669 tokens

Optimization Features

  • Memory Efficient: FSDP sharding reduces memory footprint
  • Quantization Ready: QLoRA-compatible for efficient fine-tuning
  • Multi-GPU: Optimized for distributed inference

Training Infrastructure

  • Distributed Training: FSDP (Fully Sharded Data Parallel)
  • Communication: NCCL with Ethernet backend
  • Memory Management: Expandable segments, optimized allocation
  • Monitoring: Weights & Biases integration

Limitations

  • This model is optimized for the specific training dataset and may not generalize to all use cases
  • Context length is limited to 1024 tokens during training
  • Performance may vary depending on the specific task and input format

Ethical Considerations

This model inherits the capabilities and limitations of the base Qwen3-1.7B model. Users should be aware of potential biases and use the model responsibly.

Citation

If you use this model, please cite:

@model{qwen3-1.7b-sft-merged,
  title={Qwen3-1.7B SFT Model with FSDP+QLoRA},
  author={u-10bei},
  year={2025},
  url={https://huggingface.co/u-10bei/qwen3-1.7b-sft-merged}
}

Model Card Authors

  • u-10bei

Training Date

August 2025


This model was trained using advanced distributed training techniques (FSDP + QLoRA) on high-performance H100 hardware for optimal efficiency and scalability.

Downloads last month
2
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for u-10bei/qwen3-1.7b-sft-merged

Finetuned
Qwen/Qwen3-1.7B
Finetuned
(337)
this model