Update README.md
Browse files
README.md
CHANGED
|
@@ -6,7 +6,7 @@ license: cc-by-nc-4.0
|
|
| 6 |
|
| 7 |
Build a fastest OSS vllm-based speculative decoding system for your own model, using [ArcticTraining](https://github.com/snowflakedb/ArcticTraining) and [ArcticInference](https://github.com/snowflakedb/ArcticInference)!
|
| 8 |
|
| 9 |
-
We compare the throughput (tokens/s) of existing vllm-based speculative decoding
|
| 10 |
|
| 11 |
| method | ShareGPT | HumanEval |
|
| 12 |
|--------------------------------------|----------------|--------------|
|
|
@@ -25,8 +25,7 @@ We also release ArcticSpeculator checkpoints we trained with [ArcticTraining](ht
|
|
| 25 |
|
| 26 |
| model | ArcticSpeculator |
|
| 27 |
|---- | ---- |
|
| 28 |
-
| [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) | |
|
| 29 |
-
| [Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | |
|
| 30 |
-
| [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | |
|
| 31 |
-
|
| 32 |
-
| [openhands-lm-32b-v0.1-ep3](https://huggingface.co/all-hands/openhands-lm-32b-v0.1-ep3)| | -->
|
|
|
|
| 6 |
|
| 7 |
Build a fastest OSS vllm-based speculative decoding system for your own model, using [ArcticTraining](https://github.com/snowflakedb/ArcticTraining) and [ArcticInference](https://github.com/snowflakedb/ArcticInference)!
|
| 8 |
|
| 9 |
+
We compare the throughput (tokens/s) of existing vllm-based speculative decoding systems for Llama3.1-70B-Instruct on 8xH100 as below:
|
| 10 |
|
| 11 |
| method | ShareGPT | HumanEval |
|
| 12 |
|--------------------------------------|----------------|--------------|
|
|
|
|
| 25 |
|
| 26 |
| model | ArcticSpeculator |
|
| 27 |
|---- | ---- |
|
| 28 |
+
| [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) | [Arctic-LSTM-Speculator-Llama-3.1-70B-Instruct](https://huggingface.co/Snowflake/Arctic-LSTM-Speculator-Llama-3.1-70B-Instruct) |
|
| 29 |
+
| [Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | [Arctic-LSTM-Speculator-Llama-3.3-70B-Instruct](https://huggingface.co/Snowflake/Arctic-LSTM-Speculator-Llama-3.3-70B-Instruct) |
|
| 30 |
+
| [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | [Arctic-LSTM-Speculator-Qwen2.5-32B-Instruct](https://huggingface.co/Snowflake/Arctic-LSTM-Speculator-Qwen2.5-32B-Instruct) |
|
| 31 |
+
| [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | [Arctic-LSTM-Speculator-Llama-3.1-8B-Instruct](https://huggingface.co/Snowflake/Arctic-LSTM-Speculator-Llama-3.1-8B-Instruct)|
|
|
|