|
|
--- |
|
|
language: |
|
|
- en |
|
|
- zh |
|
|
license: apache-2.0 |
|
|
library_name: transformers |
|
|
base_model: Qwen/Qwen2.5-3B |
|
|
tags: |
|
|
- qwen2.5 |
|
|
- text-generation |
|
|
- pytorch |
|
|
- multilingual |
|
|
- qnn-ready |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# Qwen 2.5 3B - QNN Ready / Qwen 2.5 3B - QNN対応 |
|
|
|
|
|
## English |
|
|
|
|
|
### Model Overview |
|
|
|
|
|
This repository contains the original Qwen 2.5 3B model prepared for QNN deployment and optimization. The model is unmodified and ready for conversion to various formats including ONNX and QNN. |
|
|
|
|
|
### Model Details |
|
|
|
|
|
- **Base Model**: Qwen/Qwen2.5-3B |
|
|
- **Architecture**: Qwen2ForCausalLM |
|
|
- **Parameters**: ~3B |
|
|
- **Languages**: English, Chinese, and others |
|
|
- **Format**: PyTorch (Safetensors) |
|
|
- **Size**: ~6.17GB |
|
|
|
|
|
### Features |
|
|
|
|
|
- ✅ **Original Model**: Unmodified Qwen 2.5 3B |
|
|
- ✅ **Safetensors**: Safe tensor format for security |
|
|
- ✅ **QNN Ready**: Prepared for Qualcomm Neural Network conversion |
|
|
- ✅ **Multilingual**: Supports English, Chinese, and other languages |
|
|
- ✅ **Production Ready**: Suitable for production deployments |
|
|
|
|
|
### System Requirements |
|
|
|
|
|
#### Minimum Requirements |
|
|
- **CPU**: Intel i5-8400 / AMD Ryzen 5 2600 or better |
|
|
- **RAM**: 8GB system memory |
|
|
- **Storage**: 10GB free space |
|
|
- **OS**: Windows 10/11, macOS 10.15+, Ubuntu 18.04+ |
|
|
|
|
|
#### Recommended Requirements |
|
|
- **CPU**: Intel i7-10700K / AMD Ryzen 7 3700X or better |
|
|
- **RAM**: 16GB system memory |
|
|
- **GPU**: NVIDIA RTX 3060 (8GB VRAM) or better |
|
|
- **Storage**: 20GB free SSD space |
|
|
|
|
|
#### Supported Devices |
|
|
- **Desktop**: Windows, macOS, Linux |
|
|
- **Cloud**: AWS, Google Cloud, Azure |
|
|
- **Edge**: NVIDIA Jetson Nano, Raspberry Pi 4 (8GB) |
|
|
- **Mobile**: iOS (via Core ML), Android (via TensorFlow Lite) |
|
|
|
|
|
### Usage |
|
|
|
|
|
#### Basic Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained('marcusmi4n/qwen2.5-3b-original') |
|
|
tokenizer = AutoTokenizer.from_pretrained('marcusmi4n/qwen2.5-3b-original') |
|
|
|
|
|
# Generate text |
|
|
inputs = tokenizer("Hello, I am", return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_length=50) |
|
|
print(tokenizer.decode(outputs[0])) |
|
|
``` |
|
|
|
|
|
#### Chinese Text Generation |
|
|
|
|
|
```python |
|
|
# Chinese text generation |
|
|
inputs = tokenizer("你好,我是", return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_length=100) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
#### Multilingual Support |
|
|
|
|
|
```python |
|
|
# English |
|
|
inputs = tokenizer("The weather is", return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_length=50) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
|
|
|
# Chinese |
|
|
inputs = tokenizer("今天天气", return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_length=50) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
### QNN Conversion Pipeline |
|
|
|
|
|
This model can be converted to QNN format using the following pipeline: |
|
|
|
|
|
#### 1. Quantization |
|
|
```bash |
|
|
python scripts/simple_quantize_abeja.py --model-path marcusmi4n/qwen2.5-3b-original |
|
|
``` |
|
|
|
|
|
#### 2. ONNX Conversion |
|
|
```bash |
|
|
python scripts/create_mock_onnx.py --model-path marcusmi4n/qwen2.5-3b-original |
|
|
``` |
|
|
|
|
|
#### 3. QNN Compilation |
|
|
```bash |
|
|
python scripts/mock_qnn_compile.py --model-path marcusmi4n/qwen2.5-3b-original |
|
|
``` |
|
|
|
|
|
### Performance |
|
|
|
|
|
- **Inference Speed**: ~20-30 tokens/sec on modern GPU |
|
|
- **Memory Usage**: ~6GB VRAM for inference |
|
|
- **Quality**: High-quality text generation |
|
|
- **Languages**: Excellent performance in English and Chinese |
|
|
- **Latency**: <100ms for short prompts, <500ms for long prompts |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
pip install transformers torch accelerate |
|
|
``` |
|
|
|
|
|
### Files Included |
|
|
|
|
|
- `model-00001-of-00002.safetensors` - Model weights part 1 |
|
|
- `model-00002-of-00002.safetensors` - Model weights part 2 |
|
|
- `model.safetensors.index.json` - Model index |
|
|
- `config.json` - Model configuration |
|
|
- `tokenizer.json` - Tokenizer |
|
|
- `tokenizer_config.json` - Tokenizer configuration |
|
|
- `vocab.json` - Vocabulary |
|
|
- `merges.txt` - BPE merges |
|
|
- `special_tokens_map.json` - Special tokens |
|
|
- `generation_config.json` - Generation configuration |
|
|
- `model_info.json` - Model information |
|
|
- `LICENSE` - License file |
|
|
|
|
|
--- |
|
|
|
|
|
## 中文 |
|
|
|
|
|
### 模型概述 |
|
|
|
|
|
此存储库包含为QNN部署和优化准备的原始Qwen 2.5 3B模型。该模型未经修改,可转换为包括ONNX和QNN在内的各种格式。 |
|
|
|
|
|
### 模型详情 |
|
|
|
|
|
- **基础模型**: Qwen/Qwen2.5-3B |
|
|
- **架构**: Qwen2ForCausalLM |
|
|
- **参数**: ~3B |
|
|
- **语言**: 英语、中文等 |
|
|
- **格式**: PyTorch (Safetensors) |
|
|
- **大小**: ~6.17GB |
|
|
|
|
|
### 特性 |
|
|
|
|
|
- ✅ **原始模型**: 未经修改的Qwen 2.5 3B |
|
|
- ✅ **Safetensors**: 安全的张量格式 |
|
|
- ✅ **QNN就绪**: 为Qualcomm神经网络转换准备 |
|
|
- ✅ **多语言**: 支持英语、中文和其他语言 |
|
|
- ✅ **生产就绪**: 适合生产部署 |
|
|
|
|
|
### 系统要求 |
|
|
|
|
|
#### 最低要求 |
|
|
- **CPU**: Intel i5-8400 / AMD Ryzen 5 2600或更好 |
|
|
- **RAM**: 8GB系统内存 |
|
|
- **存储**: 10GB可用空间 |
|
|
- **OS**: Windows 10/11, macOS 10.15+, Ubuntu 18.04+ |
|
|
|
|
|
#### 推荐要求 |
|
|
- **CPU**: Intel i7-10700K / AMD Ryzen 7 3700X或更好 |
|
|
- **RAM**: 16GB系统内存 |
|
|
- **GPU**: NVIDIA RTX 3060 (8GB VRAM)或更好 |
|
|
- **存储**: 20GB可用SSD空间 |
|
|
|
|
|
#### 支持的设备 |
|
|
- **桌面**: Windows, macOS, Linux |
|
|
- **云**: AWS, Google Cloud, Azure |
|
|
- **边缘**: NVIDIA Jetson Nano, Raspberry Pi 4 (8GB) |
|
|
- **移动**: iOS (通过Core ML), Android (通过TensorFlow Lite) |
|
|
|
|
|
### 使用方法 |
|
|
|
|
|
#### 基本使用 |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained('marcusmi4n/qwen2.5-3b-original') |
|
|
tokenizer = AutoTokenizer.from_pretrained('marcusmi4n/qwen2.5-3b-original') |
|
|
|
|
|
# 生成文本 |
|
|
inputs = tokenizer("你好,我是", return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_length=50) |
|
|
print(tokenizer.decode(outputs[0])) |
|
|
``` |
|
|
|
|
|
#### 多语言支持 |
|
|
|
|
|
```python |
|
|
# 英语 |
|
|
inputs = tokenizer("The weather is", return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_length=50) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
|
|
|
# 中文 |
|
|
inputs = tokenizer("今天天气", return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_length=50) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
### QNN转换流程 |
|
|
|
|
|
此模型可以使用以下流程转换为QNN格式: |
|
|
|
|
|
#### 1. 量化 |
|
|
```bash |
|
|
python scripts/simple_quantize_abeja.py --model-path marcusmi4n/qwen2.5-3b-original |
|
|
``` |
|
|
|
|
|
#### 2. ONNX转换 |
|
|
```bash |
|
|
python scripts/create_mock_onnx.py --model-path marcusmi4n/qwen2.5-3b-original |
|
|
``` |
|
|
|
|
|
#### 3. QNN编译 |
|
|
```bash |
|
|
python scripts/mock_qnn_compile.py --model-path marcusmi4n/qwen2.5-3b-original |
|
|
``` |
|
|
|
|
|
### 性能 |
|
|
|
|
|
- **推理速度**: 现代GPU上约20-30令牌/秒 |
|
|
- **内存使用**: 推理约6GB VRAM |
|
|
- **质量**: 高质量文本生成 |
|
|
- **语言**: 英语和中文性能优异 |
|
|
- **延迟**: 短提示<100ms,长提示<500ms |
|
|
|
|
|
### 安装 |
|
|
|
|
|
```bash |
|
|
pip install transformers torch accelerate |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
**Author**: Mukwaya Mark |
|
|
|