jeffra commited on
Commit
e78c651
·
verified ·
1 Parent(s): 49f70b2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -3
README.md CHANGED
@@ -1,3 +1,25 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ ---
4
+
5
+ # ArcticSpeculator
6
+
7
+ Build the fastest OSS vllm-based speculative decoding system for your own model, using [ArcticTraining](https://github.com/snowflakedb/ArcticTraining) and [ArcticInference](https://github.com/snowflakedb/ArcticInference)!
8
+
9
+ <!--We compare the throughput (tokens/s) of existing vllm-based speculative decoding systems for Llama3.1-70B-Instruct on 8xH100 as below:
10
+
11
+ | method | ShareGPT | HumanEval |
12
+ |--------------------------------------|----------------|--------------|
13
+ | VLLM V1 Baseline | 84.1 | 84.1 |
14
+ | VLLM V1 Eagle | 102.2 | 112.0 |
15
+ | VLLM V1 Eagle3 | 77.7 | 85.3 |
16
+ | VLLM V0 MLP-Speculator (IBM) | 77.9 | 66.7 |
17
+ | ArcticSpeculator | **172.4** | **203.7** |
18
+ -->
19
+
20
+ For more details about ArcticSpeculator and how to use it:
21
+
22
+ * ❄️ [Using Arctic-Inference and Arctic-Training for improving real-world speculative decoding Performance (blog)]()
23
+ * 🚀 [Getting started guide using ArcticTraining](https://github.com/snowflakedb/ArcticTraining/tree/mlp-variant-speculator/projects/mlp_variant_speculator)
24
+
25
+ See all of the speculators we have released via our [Speculators Collection](https://huggingface.co/collections/Snowflake/speculators-6812b07f3186d13e243022e4)