File size: 6,874 Bytes
af6b7ce
 
 
 
 
 
 
 
 
 
 
 
258cf5a
af6b7ce
 
 
258cf5a
af6b7ce
258cf5a
af6b7ce
258cf5a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
af6b7ce
 
 
 
258cf5a
 
af6b7ce
258cf5a
af6b7ce
 
 
 
 
258cf5a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
af6b7ce
258cf5a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
---
language: 
- en
- zh
license: apache-2.0
library_name: transformers
base_model: Qwen/Qwen2.5-3B
tags:
- qwen2.5
- text-generation
- pytorch
- multilingual
- qnn-ready
pipeline_tag: text-generation
---

# Qwen 2.5 3B - QNN Ready / Qwen 2.5 3B - QNN対応

## English

### Model Overview

This repository contains the original Qwen 2.5 3B model prepared for QNN deployment and optimization. The model is unmodified and ready for conversion to various formats including ONNX and QNN.

### Model Details

- **Base Model**: Qwen/Qwen2.5-3B
- **Architecture**: Qwen2ForCausalLM
- **Parameters**: ~3B
- **Languages**: English, Chinese, and others
- **Format**: PyTorch (Safetensors)
- **Size**: ~6.17GB

### Features

-**Original Model**: Unmodified Qwen 2.5 3B
-**Safetensors**: Safe tensor format for security
-**QNN Ready**: Prepared for Qualcomm Neural Network conversion
-**Multilingual**: Supports English, Chinese, and other languages
-**Production Ready**: Suitable for production deployments

### System Requirements

#### Minimum Requirements
- **CPU**: Intel i5-8400 / AMD Ryzen 5 2600 or better
- **RAM**: 8GB system memory
- **Storage**: 10GB free space
- **OS**: Windows 10/11, macOS 10.15+, Ubuntu 18.04+

#### Recommended Requirements
- **CPU**: Intel i7-10700K / AMD Ryzen 7 3700X or better
- **RAM**: 16GB system memory
- **GPU**: NVIDIA RTX 3060 (8GB VRAM) or better
- **Storage**: 20GB free SSD space

#### Supported Devices
- **Desktop**: Windows, macOS, Linux
- **Cloud**: AWS, Google Cloud, Azure
- **Edge**: NVIDIA Jetson Nano, Raspberry Pi 4 (8GB)
- **Mobile**: iOS (via Core ML), Android (via TensorFlow Lite)

### Usage

#### Basic Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained('marcusmi4n/qwen2.5-3b-original')
tokenizer = AutoTokenizer.from_pretrained('marcusmi4n/qwen2.5-3b-original')

# Generate text
inputs = tokenizer("Hello, I am", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))
```

#### Chinese Text Generation

```python
# Chinese text generation
inputs = tokenizer("你好,我是", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

#### Multilingual Support

```python
# English
inputs = tokenizer("The weather is", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

# Chinese
inputs = tokenizer("今天天气", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### QNN Conversion Pipeline

This model can be converted to QNN format using the following pipeline:

#### 1. Quantization
```bash
python scripts/simple_quantize_abeja.py --model-path marcusmi4n/qwen2.5-3b-original
```

#### 2. ONNX Conversion
```bash
python scripts/create_mock_onnx.py --model-path marcusmi4n/qwen2.5-3b-original
```

#### 3. QNN Compilation
```bash
python scripts/mock_qnn_compile.py --model-path marcusmi4n/qwen2.5-3b-original
```

### Performance

- **Inference Speed**: ~20-30 tokens/sec on modern GPU
- **Memory Usage**: ~6GB VRAM for inference
- **Quality**: High-quality text generation
- **Languages**: Excellent performance in English and Chinese
- **Latency**: <100ms for short prompts, <500ms for long prompts

### Installation

```bash
pip install transformers torch accelerate
```

### Files Included

- `model-00001-of-00002.safetensors` - Model weights part 1
- `model-00002-of-00002.safetensors` - Model weights part 2
- `model.safetensors.index.json` - Model index
- `config.json` - Model configuration
- `tokenizer.json` - Tokenizer
- `tokenizer_config.json` - Tokenizer configuration
- `vocab.json` - Vocabulary
- `merges.txt` - BPE merges
- `special_tokens_map.json` - Special tokens
- `generation_config.json` - Generation configuration
- `model_info.json` - Model information
- `LICENSE` - License file

---

## 中文

### 模型概述

此存储库包含为QNN部署和优化准备的原始Qwen 2.5 3B模型。该模型未经修改,可转换为包括ONNX和QNN在内的各种格式。

### 模型详情

- **基础模型**: Qwen/Qwen2.5-3B
- **架构**: Qwen2ForCausalLM
- **参数**: ~3B
- **语言**: 英语、中文等
- **格式**: PyTorch (Safetensors)
- **大小**: ~6.17GB

### 特性

-**原始模型**: 未经修改的Qwen 2.5 3B
-**Safetensors**: 安全的张量格式
-**QNN就绪**: 为Qualcomm神经网络转换准备
-**多语言**: 支持英语、中文和其他语言
-**生产就绪**: 适合生产部署

### 系统要求

#### 最低要求
- **CPU**: Intel i5-8400 / AMD Ryzen 5 2600或更好
- **RAM**: 8GB系统内存
- **存储**: 10GB可用空间
- **OS**: Windows 10/11, macOS 10.15+, Ubuntu 18.04+

#### 推荐要求
- **CPU**: Intel i7-10700K / AMD Ryzen 7 3700X或更好
- **RAM**: 16GB系统内存
- **GPU**: NVIDIA RTX 3060 (8GB VRAM)或更好
- **存储**: 20GB可用SSD空间

#### 支持的设备
- **桌面**: Windows, macOS, Linux
- **云**: AWS, Google Cloud, Azure
- **边缘**: NVIDIA Jetson Nano, Raspberry Pi 4 (8GB)
- **移动**: iOS (通过Core ML), Android (通过TensorFlow Lite)

### 使用方法

#### 基本使用

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained('marcusmi4n/qwen2.5-3b-original')
tokenizer = AutoTokenizer.from_pretrained('marcusmi4n/qwen2.5-3b-original')

# 生成文本
inputs = tokenizer("你好,我是", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))
```

#### 多语言支持

```python
# 英语
inputs = tokenizer("The weather is", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

# 中文
inputs = tokenizer("今天天气", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### QNN转换流程

此模型可以使用以下流程转换为QNN格式:

#### 1. 量化
```bash
python scripts/simple_quantize_abeja.py --model-path marcusmi4n/qwen2.5-3b-original
```

#### 2. ONNX转换
```bash
python scripts/create_mock_onnx.py --model-path marcusmi4n/qwen2.5-3b-original
```

#### 3. QNN编译
```bash
python scripts/mock_qnn_compile.py --model-path marcusmi4n/qwen2.5-3b-original
```

### 性能

- **推理速度**: 现代GPU上约20-30令牌/秒
- **内存使用**: 推理约6GB VRAM
- **质量**: 高质量文本生成
- **语言**: 英语和中文性能优异
- **延迟**: 短提示<100ms,长提示<500ms

### 安装

```bash
pip install transformers torch accelerate
```

---

**Author**: Mukwaya Mark