Upload 2 files
Browse files- README.md +97 -1
- README_zh.md +97 -1
README.md
CHANGED
|
@@ -41,7 +41,103 @@
|
|
| 41 |
- **Model Scoring Dataset**: [HPC-LLM-8k](https://huggingface.co/datasets/ECNU-InnoSpark/HPC-LLM-8k)
|
| 42 |
- **Human Scoring Dataset**: [HPC-Human-8k](https://huggingface.co/datasets/ECNU-InnoSpark/HPC-Human-8k)
|
| 43 |
|
| 44 |
-
## 🚀
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
### 🎯 Open Source Product Matrix
|
| 47 |
|
|
|
|
| 41 |
- **Model Scoring Dataset**: [HPC-LLM-8k](https://huggingface.co/datasets/ECNU-InnoSpark/HPC-LLM-8k)
|
| 42 |
- **Human Scoring Dataset**: [HPC-Human-8k](https://huggingface.co/datasets/ECNU-InnoSpark/HPC-Human-8k)
|
| 43 |
|
| 44 |
+
## 🚀 Quickstart
|
| 45 |
+
|
| 46 |
+
Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
|
| 47 |
+
|
| 48 |
+
```python
|
| 49 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 50 |
+
device = "cuda" # the device to load the model onto
|
| 51 |
+
|
| 52 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 53 |
+
"sii-research/InnoSpark-72B-0710",
|
| 54 |
+
torch_dtype="auto",
|
| 55 |
+
device_map="auto"
|
| 56 |
+
)
|
| 57 |
+
tokenizer = AutoTokenizer.from_pretrained("sii-research/InnoSpark-72B-0710")
|
| 58 |
+
|
| 59 |
+
prompt = "Introduce yourself in detail."
|
| 60 |
+
messages = [
|
| 61 |
+
{"role": "system", "content": "You are InnoSpark(启创), created by Shanghai Innovation Institute (上海创智学院) and East China Normal University(华东师范大学). You are a helpful assistant."},
|
| 62 |
+
{"role": "user", "content": prompt}
|
| 63 |
+
]
|
| 64 |
+
text = tokenizer.apply_chat_template(
|
| 65 |
+
messages,
|
| 66 |
+
tokenize=False,
|
| 67 |
+
add_generation_prompt=True
|
| 68 |
+
)
|
| 69 |
+
model_inputs = tokenizer([text], return_tensors="pt").to(device)
|
| 70 |
+
|
| 71 |
+
generated_ids = model.generate(
|
| 72 |
+
model_inputs.input_ids,
|
| 73 |
+
max_new_tokens=512
|
| 74 |
+
)
|
| 75 |
+
generated_ids = [
|
| 76 |
+
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
| 77 |
+
]
|
| 78 |
+
|
| 79 |
+
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
### VLLM
|
| 83 |
+
|
| 84 |
+
We recommend deploying our model using 4 A100 GPUs. You can run the vllm server-side with the following code in terminal:
|
| 85 |
+
|
| 86 |
+
```python
|
| 87 |
+
python -m vllm.entrypoints.openai.api_server --served-model-name InnoSpark --model path/to/InnoSpark --gpu-memory-utilization 0.98 --tensor-parallel-size 4 --port 6000
|
| 88 |
+
```
|
| 89 |
+
|
| 90 |
+
Then, you can use the following code to deploy client-side:
|
| 91 |
+
|
| 92 |
+
```python
|
| 93 |
+
import requests
|
| 94 |
+
import json
|
| 95 |
+
|
| 96 |
+
def Innospark_stream(inputs,history):
|
| 97 |
+
url = 'http://loaclhost:6000/v1/chat/completions'
|
| 98 |
+
|
| 99 |
+
history+=[{"role": "user", "content": inputs},]
|
| 100 |
+
|
| 101 |
+
headers = {"User-Agent": "vLLM Client"}
|
| 102 |
+
|
| 103 |
+
pload = {
|
| 104 |
+
"model": "InnoSpark",
|
| 105 |
+
"stream": True,
|
| 106 |
+
"messages": history
|
| 107 |
+
}
|
| 108 |
+
response = requests.post(url,
|
| 109 |
+
headers=headers,
|
| 110 |
+
json=pload,
|
| 111 |
+
stream=True)
|
| 112 |
+
|
| 113 |
+
for chunk in response.iter_lines(chunk_size=1,
|
| 114 |
+
decode_unicode=False,
|
| 115 |
+
delimiter=b"\n"):
|
| 116 |
+
if chunk:
|
| 117 |
+
string_data = chunk.decode("utf-8")
|
| 118 |
+
try:
|
| 119 |
+
json_data = json.loads(string_data[6:])
|
| 120 |
+
delta_content = json_data["choices"][0]["delta"]["content"]
|
| 121 |
+
assistant_reply+=delta_content
|
| 122 |
+
yield delta_content
|
| 123 |
+
except KeyError as e:
|
| 124 |
+
delta_content = json_data["choices"][0]["delta"]["role"]
|
| 125 |
+
except json.JSONDecodeError as e:
|
| 126 |
+
history+=[{
|
| 127 |
+
"role": "assistant",
|
| 128 |
+
"content": assistant_reply,
|
| 129 |
+
"tool_calls": []
|
| 130 |
+
},]
|
| 131 |
+
delta_content='[DONE]'
|
| 132 |
+
assert '[DONE]'==chunk.decode("utf-8")[6:]
|
| 133 |
+
|
| 134 |
+
inputs='hi'
|
| 135 |
+
history=[]
|
| 136 |
+
for response_text in Innospark_stream(inputs,history):
|
| 137 |
+
print(response_text,end='')
|
| 138 |
+
```
|
| 139 |
+
|
| 140 |
+
## 🌟 Core Features
|
| 141 |
|
| 142 |
### 🎯 Open Source Product Matrix
|
| 143 |
|
README_zh.md
CHANGED
|
@@ -41,7 +41,103 @@
|
|
| 41 |
- **模型打分数据集**: [HPC-LLM-8k](https://huggingface.co/datasets/ECNU-InnoSpark/HPC-LLM-8k)
|
| 42 |
- **人工打分数据集**: [HPC-Human-8k](https://huggingface.co/datasets/ECNU-InnoSpark/HPC-Human-8k)
|
| 43 |
|
| 44 |
-
## 🚀
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
### 🎯 开源产品矩阵
|
| 47 |
|
|
|
|
| 41 |
- **模型打分数据集**: [HPC-LLM-8k](https://huggingface.co/datasets/ECNU-InnoSpark/HPC-LLM-8k)
|
| 42 |
- **人工打分数据集**: [HPC-Human-8k](https://huggingface.co/datasets/ECNU-InnoSpark/HPC-Human-8k)
|
| 43 |
|
| 44 |
+
## 🚀 快速开始
|
| 45 |
+
|
| 46 |
+
这里提供了一个使用 `apply_chat_template` 的代码示例,展示如何加载分词器和模型以及如何生成内容。
|
| 47 |
+
|
| 48 |
+
```python
|
| 49 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 50 |
+
device = "cuda" # 加载模型的设备
|
| 51 |
+
|
| 52 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 53 |
+
"sii-research/InnoSpark-72B-0710",
|
| 54 |
+
torch_dtype="auto",
|
| 55 |
+
device_map="auto"
|
| 56 |
+
)
|
| 57 |
+
tokenizer = AutoTokenizer.from_pretrained("sii-research/InnoSpark-72B-0710")
|
| 58 |
+
|
| 59 |
+
prompt = "详细介绍一下你自己。"
|
| 60 |
+
messages = [
|
| 61 |
+
{"role": "system", "content": "You are InnoSpark(启创), created by Shanghai Innovation Institute (上海创智学院) and East China Normal University(华东师范大学). You are a helpful assistant."},
|
| 62 |
+
{"role": "user", "content": prompt}
|
| 63 |
+
]
|
| 64 |
+
text = tokenizer.apply_chat_template(
|
| 65 |
+
messages,
|
| 66 |
+
tokenize=False,
|
| 67 |
+
add_generation_prompt=True
|
| 68 |
+
)
|
| 69 |
+
model_inputs = tokenizer([text], return_tensors="pt").to(device)
|
| 70 |
+
|
| 71 |
+
generated_ids = model.generate(
|
| 72 |
+
model_inputs.input_ids,
|
| 73 |
+
max_new_tokens=512
|
| 74 |
+
)
|
| 75 |
+
generated_ids = [
|
| 76 |
+
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
| 77 |
+
]
|
| 78 |
+
|
| 79 |
+
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
### VLLM 部署
|
| 83 |
+
|
| 84 |
+
我们推荐使用 4 块 A100 GPU 部署我们的模型。您可以在终端中使用以下代码运行 vllm 服务端:
|
| 85 |
+
|
| 86 |
+
```python
|
| 87 |
+
python -m vllm.entrypoints.openai.api_server --served-model-name InnoSpark --model path/to/InnoSpark --gpu-memory-utilization 0.98 --tensor-parallel-size 4 --port 6000
|
| 88 |
+
```
|
| 89 |
+
|
| 90 |
+
然后,您可以使用以下代码部署客户端:
|
| 91 |
+
|
| 92 |
+
```python
|
| 93 |
+
import requests
|
| 94 |
+
import json
|
| 95 |
+
|
| 96 |
+
def Innospark_stream(inputs,history):
|
| 97 |
+
url = 'http://loaclhost:6000/v1/chat/completions'
|
| 98 |
+
|
| 99 |
+
history+=[{"role": "user", "content": inputs},]
|
| 100 |
+
|
| 101 |
+
headers = {"User-Agent": "vLLM Client"}
|
| 102 |
+
|
| 103 |
+
pload = {
|
| 104 |
+
"model": "InnoSpark",
|
| 105 |
+
"stream": True,
|
| 106 |
+
"messages": history
|
| 107 |
+
}
|
| 108 |
+
response = requests.post(url,
|
| 109 |
+
headers=headers,
|
| 110 |
+
json=pload,
|
| 111 |
+
stream=True)
|
| 112 |
+
|
| 113 |
+
for chunk in response.iter_lines(chunk_size=1,
|
| 114 |
+
decode_unicode=False,
|
| 115 |
+
delimiter=b"\n"):
|
| 116 |
+
if chunk:
|
| 117 |
+
string_data = chunk.decode("utf-8")
|
| 118 |
+
try:
|
| 119 |
+
json_data = json.loads(string_data[6:])
|
| 120 |
+
delta_content = json_data["choices"][0]["delta"]["content"]
|
| 121 |
+
assistant_reply+=delta_content
|
| 122 |
+
yield delta_content
|
| 123 |
+
except KeyError as e:
|
| 124 |
+
delta_content = json_data["choices"][0]["delta"]["role"]
|
| 125 |
+
except json.JSONDecodeError as e:
|
| 126 |
+
history+=[{
|
| 127 |
+
"role": "assistant",
|
| 128 |
+
"content": assistant_reply,
|
| 129 |
+
"tool_calls": []
|
| 130 |
+
},]
|
| 131 |
+
delta_content='[DONE]'
|
| 132 |
+
assert '[DONE]'==chunk.decode("utf-8")[6:]
|
| 133 |
+
|
| 134 |
+
inputs='hi'
|
| 135 |
+
history=[]
|
| 136 |
+
for response_text in Innospark_stream(inputs,history):
|
| 137 |
+
print(response_text,end='')
|
| 138 |
+
```
|
| 139 |
+
|
| 140 |
+
## 🌟 核心特性
|
| 141 |
|
| 142 |
### 🎯 开源产品矩阵
|
| 143 |
|