Qwen 2.5 3B - QNN Ready / Qwen 2.5 3B - QNN対応
English
Model Overview
This repository contains the original Qwen 2.5 3B model prepared for QNN deployment and optimization. The model is unmodified and ready for conversion to various formats including ONNX and QNN.
Model Details
- Base Model: Qwen/Qwen2.5-3B
- Architecture: Qwen2ForCausalLM
- Parameters: ~3B
- Languages: English, Chinese, and others
- Format: PyTorch (Safetensors)
- Size: ~6.17GB
Features
- ✅ Original Model: Unmodified Qwen 2.5 3B
- ✅ Safetensors: Safe tensor format for security
- ✅ QNN Ready: Prepared for Qualcomm Neural Network conversion
- ✅ Multilingual: Supports English, Chinese, and other languages
- ✅ Production Ready: Suitable for production deployments
System Requirements
Minimum Requirements
- CPU: Intel i5-8400 / AMD Ryzen 5 2600 or better
- RAM: 8GB system memory
- Storage: 10GB free space
- OS: Windows 10/11, macOS 10.15+, Ubuntu 18.04+
Recommended Requirements
- CPU: Intel i7-10700K / AMD Ryzen 7 3700X or better
- RAM: 16GB system memory
- GPU: NVIDIA RTX 3060 (8GB VRAM) or better
- Storage: 20GB free SSD space
Supported Devices
- Desktop: Windows, macOS, Linux
- Cloud: AWS, Google Cloud, Azure
- Edge: NVIDIA Jetson Nano, Raspberry Pi 4 (8GB)
- Mobile: iOS (via Core ML), Android (via TensorFlow Lite)
Usage
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('marcusmi4n/qwen2.5-3b-original')
tokenizer = AutoTokenizer.from_pretrained('marcusmi4n/qwen2.5-3b-original')
# Generate text
inputs = tokenizer("Hello, I am", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))
Chinese Text Generation
# Chinese text generation
inputs = tokenizer("你好,我是", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Multilingual Support
# English
inputs = tokenizer("The weather is", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Chinese
inputs = tokenizer("今天天气", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
QNN Conversion Pipeline
This model can be converted to QNN format using the following pipeline:
1. Quantization
python scripts/simple_quantize_abeja.py --model-path marcusmi4n/qwen2.5-3b-original
2. ONNX Conversion
python scripts/create_mock_onnx.py --model-path marcusmi4n/qwen2.5-3b-original
3. QNN Compilation
python scripts/mock_qnn_compile.py --model-path marcusmi4n/qwen2.5-3b-original
Performance
- Inference Speed: ~20-30 tokens/sec on modern GPU
- Memory Usage: ~6GB VRAM for inference
- Quality: High-quality text generation
- Languages: Excellent performance in English and Chinese
- Latency: <100ms for short prompts, <500ms for long prompts
Installation
pip install transformers torch accelerate
Files Included
model-00001-of-00002.safetensors- Model weights part 1model-00002-of-00002.safetensors- Model weights part 2model.safetensors.index.json- Model indexconfig.json- Model configurationtokenizer.json- Tokenizertokenizer_config.json- Tokenizer configurationvocab.json- Vocabularymerges.txt- BPE mergesspecial_tokens_map.json- Special tokensgeneration_config.json- Generation configurationmodel_info.json- Model informationLICENSE- License file
中文
模型概述
此存储库包含为QNN部署和优化准备的原始Qwen 2.5 3B模型。该模型未经修改,可转换为包括ONNX和QNN在内的各种格式。
模型详情
- 基础模型: Qwen/Qwen2.5-3B
- 架构: Qwen2ForCausalLM
- 参数: ~3B
- 语言: 英语、中文等
- 格式: PyTorch (Safetensors)
- 大小: ~6.17GB
特性
- ✅ 原始模型: 未经修改的Qwen 2.5 3B
- ✅ Safetensors: 安全的张量格式
- ✅ QNN就绪: 为Qualcomm神经网络转换准备
- ✅ 多语言: 支持英语、中文和其他语言
- ✅ 生产就绪: 适合生产部署
系统要求
最低要求
- CPU: Intel i5-8400 / AMD Ryzen 5 2600或更好
- RAM: 8GB系统内存
- 存储: 10GB可用空间
- OS: Windows 10/11, macOS 10.15+, Ubuntu 18.04+
推荐要求
- CPU: Intel i7-10700K / AMD Ryzen 7 3700X或更好
- RAM: 16GB系统内存
- GPU: NVIDIA RTX 3060 (8GB VRAM)或更好
- 存储: 20GB可用SSD空间
支持的设备
- 桌面: Windows, macOS, Linux
- 云: AWS, Google Cloud, Azure
- 边缘: NVIDIA Jetson Nano, Raspberry Pi 4 (8GB)
- 移动: iOS (通过Core ML), Android (通过TensorFlow Lite)
使用方法
基本使用
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('marcusmi4n/qwen2.5-3b-original')
tokenizer = AutoTokenizer.from_pretrained('marcusmi4n/qwen2.5-3b-original')
# 生成文本
inputs = tokenizer("你好,我是", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))
多语言支持
# 英语
inputs = tokenizer("The weather is", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# 中文
inputs = tokenizer("今天天气", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
QNN转换流程
此模型可以使用以下流程转换为QNN格式:
1. 量化
python scripts/simple_quantize_abeja.py --model-path marcusmi4n/qwen2.5-3b-original
2. ONNX转换
python scripts/create_mock_onnx.py --model-path marcusmi4n/qwen2.5-3b-original
3. QNN编译
python scripts/mock_qnn_compile.py --model-path marcusmi4n/qwen2.5-3b-original
性能
- 推理速度: 现代GPU上约20-30令牌/秒
- 内存使用: 推理约6GB VRAM
- 质量: 高质量文本生成
- 语言: 英语和中文性能优异
- 延迟: 短提示<100ms,长提示<500ms
安装
pip install transformers torch accelerate
Author: Mukwaya Mark
- Downloads last month
- 4
Model tree for marcusmi4n/qwen2.5-3b-original
Base model
Qwen/Qwen2.5-3B