jeffra commited on
Commit
aaf034e
·
verified ·
1 Parent(s): fd2df5c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -6
README.md CHANGED
@@ -6,7 +6,7 @@ license: cc-by-nc-4.0
6
 
7
  Build a fastest OSS vllm-based speculative decoding system for your own model, using [ArcticTraining](https://github.com/snowflakedb/ArcticTraining) and [ArcticInference](https://github.com/snowflakedb/ArcticInference)!
8
 
9
- We compare the throughput (tokens/s) of existing vllm-based speculative decoding systmes for Llama3.1-70B-Instruct on 8xH100 as below:
10
 
11
  | method | ShareGPT | HumanEval |
12
  |--------------------------------------|----------------|--------------|
@@ -25,8 +25,7 @@ We also release ArcticSpeculator checkpoints we trained with [ArcticTraining](ht
25
 
26
  | model | ArcticSpeculator |
27
  |---- | ---- |
28
- | [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) | |
29
- | [Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | |
30
- | [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | |
31
- <!-- | [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | |
32
- | [openhands-lm-32b-v0.1-ep3](https://huggingface.co/all-hands/openhands-lm-32b-v0.1-ep3)| | -->
 
6
 
7
  Build a fastest OSS vllm-based speculative decoding system for your own model, using [ArcticTraining](https://github.com/snowflakedb/ArcticTraining) and [ArcticInference](https://github.com/snowflakedb/ArcticInference)!
8
 
9
+ We compare the throughput (tokens/s) of existing vllm-based speculative decoding systems for Llama3.1-70B-Instruct on 8xH100 as below:
10
 
11
  | method | ShareGPT | HumanEval |
12
  |--------------------------------------|----------------|--------------|
 
25
 
26
  | model | ArcticSpeculator |
27
  |---- | ---- |
28
+ | [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) | [Arctic-LSTM-Speculator-Llama-3.1-70B-Instruct](https://huggingface.co/Snowflake/Arctic-LSTM-Speculator-Llama-3.1-70B-Instruct) |
29
+ | [Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | [Arctic-LSTM-Speculator-Llama-3.3-70B-Instruct](https://huggingface.co/Snowflake/Arctic-LSTM-Speculator-Llama-3.3-70B-Instruct) |
30
+ | [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | [Arctic-LSTM-Speculator-Qwen2.5-32B-Instruct](https://huggingface.co/Snowflake/Arctic-LSTM-Speculator-Qwen2.5-32B-Instruct) |
31
+ | [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | [Arctic-LSTM-Speculator-Llama-3.1-8B-Instruct](https://huggingface.co/Snowflake/Arctic-LSTM-Speculator-Llama-3.1-8B-Instruct)|