Safetensors
qwen2
innospark commited on
Commit
8556243
·
verified ·
1 Parent(s): ac53162

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +97 -1
  2. README_zh.md +97 -1
README.md CHANGED
@@ -41,7 +41,103 @@
41
  - **Model Scoring Dataset**: [HPC-LLM-8k](https://huggingface.co/datasets/ECNU-InnoSpark/HPC-LLM-8k)
42
  - **Human Scoring Dataset**: [HPC-Human-8k](https://huggingface.co/datasets/ECNU-InnoSpark/HPC-Human-8k)
43
 
44
- ## 🚀 Core Features
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  ### 🎯 Open Source Product Matrix
47
 
 
41
  - **Model Scoring Dataset**: [HPC-LLM-8k](https://huggingface.co/datasets/ECNU-InnoSpark/HPC-LLM-8k)
42
  - **Human Scoring Dataset**: [HPC-Human-8k](https://huggingface.co/datasets/ECNU-InnoSpark/HPC-Human-8k)
43
 
44
+ ## 🚀 Quickstart
45
+
46
+ Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
47
+
48
+ ```python
49
+ from transformers import AutoModelForCausalLM, AutoTokenizer
50
+ device = "cuda" # the device to load the model onto
51
+
52
+ model = AutoModelForCausalLM.from_pretrained(
53
+ "sii-research/InnoSpark-72B-0710",
54
+ torch_dtype="auto",
55
+ device_map="auto"
56
+ )
57
+ tokenizer = AutoTokenizer.from_pretrained("sii-research/InnoSpark-72B-0710")
58
+
59
+ prompt = "Introduce yourself in detail."
60
+ messages = [
61
+ {"role": "system", "content": "You are InnoSpark(启创), created by Shanghai Innovation Institute (上海创智学院) and East China Normal University(华东师范大学). You are a helpful assistant."},
62
+ {"role": "user", "content": prompt}
63
+ ]
64
+ text = tokenizer.apply_chat_template(
65
+ messages,
66
+ tokenize=False,
67
+ add_generation_prompt=True
68
+ )
69
+ model_inputs = tokenizer([text], return_tensors="pt").to(device)
70
+
71
+ generated_ids = model.generate(
72
+ model_inputs.input_ids,
73
+ max_new_tokens=512
74
+ )
75
+ generated_ids = [
76
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
77
+ ]
78
+
79
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
80
+ ```
81
+
82
+ ### VLLM
83
+
84
+ We recommend deploying our model using 4 A100 GPUs. You can run the vllm server-side with the following code in terminal:
85
+
86
+ ```python
87
+ python -m vllm.entrypoints.openai.api_server --served-model-name InnoSpark --model path/to/InnoSpark --gpu-memory-utilization 0.98 --tensor-parallel-size 4 --port 6000
88
+ ```
89
+
90
+ Then, you can use the following code to deploy client-side:
91
+
92
+ ```python
93
+ import requests
94
+ import json
95
+
96
+ def Innospark_stream(inputs,history):
97
+ url = 'http://loaclhost:6000/v1/chat/completions'
98
+
99
+ history+=[{"role": "user", "content": inputs},]
100
+
101
+ headers = {"User-Agent": "vLLM Client"}
102
+
103
+ pload = {
104
+ "model": "InnoSpark",
105
+ "stream": True,
106
+ "messages": history
107
+ }
108
+ response = requests.post(url,
109
+ headers=headers,
110
+ json=pload,
111
+ stream=True)
112
+
113
+ for chunk in response.iter_lines(chunk_size=1,
114
+ decode_unicode=False,
115
+ delimiter=b"\n"):
116
+ if chunk:
117
+ string_data = chunk.decode("utf-8")
118
+ try:
119
+ json_data = json.loads(string_data[6:])
120
+ delta_content = json_data["choices"][0]["delta"]["content"]
121
+ assistant_reply+=delta_content
122
+ yield delta_content
123
+ except KeyError as e:
124
+ delta_content = json_data["choices"][0]["delta"]["role"]
125
+ except json.JSONDecodeError as e:
126
+ history+=[{
127
+ "role": "assistant",
128
+ "content": assistant_reply,
129
+ "tool_calls": []
130
+ },]
131
+ delta_content='[DONE]'
132
+ assert '[DONE]'==chunk.decode("utf-8")[6:]
133
+
134
+ inputs='hi'
135
+ history=[]
136
+ for response_text in Innospark_stream(inputs,history):
137
+ print(response_text,end='')
138
+ ```
139
+
140
+ ## 🌟 Core Features
141
 
142
  ### 🎯 Open Source Product Matrix
143
 
README_zh.md CHANGED
@@ -41,7 +41,103 @@
41
  - **模型打分数据集**: [HPC-LLM-8k](https://huggingface.co/datasets/ECNU-InnoSpark/HPC-LLM-8k)
42
  - **人工打分数据集**: [HPC-Human-8k](https://huggingface.co/datasets/ECNU-InnoSpark/HPC-Human-8k)
43
 
44
- ## 🚀 核心特性
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  ### 🎯 开源产品矩阵
47
 
 
41
  - **模型打分数据集**: [HPC-LLM-8k](https://huggingface.co/datasets/ECNU-InnoSpark/HPC-LLM-8k)
42
  - **人工打分数据集**: [HPC-Human-8k](https://huggingface.co/datasets/ECNU-InnoSpark/HPC-Human-8k)
43
 
44
+ ## 🚀 快速开始
45
+
46
+ 这里提供了一个使用 `apply_chat_template` 的代码示例,展示如何加载分词器和模型以及如何生成内容。
47
+
48
+ ```python
49
+ from transformers import AutoModelForCausalLM, AutoTokenizer
50
+ device = "cuda" # 加载模型的设备
51
+
52
+ model = AutoModelForCausalLM.from_pretrained(
53
+ "sii-research/InnoSpark-72B-0710",
54
+ torch_dtype="auto",
55
+ device_map="auto"
56
+ )
57
+ tokenizer = AutoTokenizer.from_pretrained("sii-research/InnoSpark-72B-0710")
58
+
59
+ prompt = "详细介绍一下你自己。"
60
+ messages = [
61
+ {"role": "system", "content": "You are InnoSpark(启创), created by Shanghai Innovation Institute (上海创智学院) and East China Normal University(华东师范大学). You are a helpful assistant."},
62
+ {"role": "user", "content": prompt}
63
+ ]
64
+ text = tokenizer.apply_chat_template(
65
+ messages,
66
+ tokenize=False,
67
+ add_generation_prompt=True
68
+ )
69
+ model_inputs = tokenizer([text], return_tensors="pt").to(device)
70
+
71
+ generated_ids = model.generate(
72
+ model_inputs.input_ids,
73
+ max_new_tokens=512
74
+ )
75
+ generated_ids = [
76
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
77
+ ]
78
+
79
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
80
+ ```
81
+
82
+ ### VLLM 部署
83
+
84
+ 我们推荐使用 4 块 A100 GPU 部署我们的模型。您可以在终端中使用以下代码运行 vllm 服务端:
85
+
86
+ ```python
87
+ python -m vllm.entrypoints.openai.api_server --served-model-name InnoSpark --model path/to/InnoSpark --gpu-memory-utilization 0.98 --tensor-parallel-size 4 --port 6000
88
+ ```
89
+
90
+ 然后,您可以使用以下代码部署客户端:
91
+
92
+ ```python
93
+ import requests
94
+ import json
95
+
96
+ def Innospark_stream(inputs,history):
97
+ url = 'http://loaclhost:6000/v1/chat/completions'
98
+
99
+ history+=[{"role": "user", "content": inputs},]
100
+
101
+ headers = {"User-Agent": "vLLM Client"}
102
+
103
+ pload = {
104
+ "model": "InnoSpark",
105
+ "stream": True,
106
+ "messages": history
107
+ }
108
+ response = requests.post(url,
109
+ headers=headers,
110
+ json=pload,
111
+ stream=True)
112
+
113
+ for chunk in response.iter_lines(chunk_size=1,
114
+ decode_unicode=False,
115
+ delimiter=b"\n"):
116
+ if chunk:
117
+ string_data = chunk.decode("utf-8")
118
+ try:
119
+ json_data = json.loads(string_data[6:])
120
+ delta_content = json_data["choices"][0]["delta"]["content"]
121
+ assistant_reply+=delta_content
122
+ yield delta_content
123
+ except KeyError as e:
124
+ delta_content = json_data["choices"][0]["delta"]["role"]
125
+ except json.JSONDecodeError as e:
126
+ history+=[{
127
+ "role": "assistant",
128
+ "content": assistant_reply,
129
+ "tool_calls": []
130
+ },]
131
+ delta_content='[DONE]'
132
+ assert '[DONE]'==chunk.decode("utf-8")[6:]
133
+
134
+ inputs='hi'
135
+ history=[]
136
+ for response_text in Innospark_stream(inputs,history):
137
+ print(response_text,end='')
138
+ ```
139
+
140
+ ## 🌟 核心特性
141
 
142
  ### 🎯 开源产品矩阵
143