JRosenkranz commited on
Commit
0976041
·
verified ·
1 Parent(s): 1143687

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -1
README.md CHANGED
@@ -70,4 +70,65 @@ pip install . --no-cache-dir
70
  python sample_client.py
71
  ```
72
 
73
- _Note: first prompt may be slower as there is a slight warmup time_
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
  python sample_client.py
71
  ```
72
 
73
+ _Note: first prompt may be slower as there is a slight warmup time_
74
+
75
+ #### Install
76
+
77
+ ```bash
78
+ git clone https://github.com/foundation-model-stack/fms-extras
79
+ git fetch origin pull/4/head:code_llama_variant
80
+ git checkout code_llama_variant
81
+ (cd fms-extras && pip install -e .)
82
+ pip install transformers==4.35.0 sentencepiece numpy
83
+ ```
84
+
85
+ #### Run Sample
86
+
87
+ ##### batch_size=1 (compile + cudagraphs)
88
+
89
+ ```bash
90
+ python fms-extras/scripts/paged_speculative_inference.py \
91
+ --variant=13b_code \
92
+ --model_path=/path/to/llama/CodeLlama-13b-Instruct-hf \
93
+ --model_source=hf \
94
+ --tokenizer=/path/to/llama/CodeLlama-13b-Instruct-hf \
95
+ --speculator_path=ibm-fms/codellama-13b-accelerator \
96
+ --speculator_source=hf \
97
+ --top_k_tokens_per_head=4,3,2,2,2,2,2 \
98
+ --prompt_type=code
99
+ --compile \
100
+ --compile_mode=reduce-overhead
101
+ ```
102
+
103
+ ##### batch_size=1 (compile)
104
+
105
+ ```bash
106
+ python fms-extras/scripts/paged_speculative_inference.py \
107
+ --variant=13b_code \
108
+ --model_path=/path/to/llama/CodeLlama-13b-Instruct-hf \
109
+ --model_source=hf \
110
+ --tokenizer=/path/to/llama/CodeLlama-13b-Instruct-hf \
111
+ --speculator_path=ibm-fms/codellama-13b-accelerator \
112
+ --speculator_source=hf \
113
+ --top_k_tokens_per_head=4,3,2,2,2,2,2 \
114
+ --prompt_type=code
115
+ --compile \
116
+ ```
117
+
118
+ ##### batch_size=4 (compile)
119
+
120
+ ```bash
121
+ python fms-extras/scripts/paged_speculative_inference.py \
122
+ --variant=13b_code \
123
+ --model_path=/path/to/llama/CodeLlama-13b-Instruct-hf \
124
+ --model_source=hf \
125
+ --tokenizer=/path/to/llama/CodeLlama-13b-Instruct-hf \
126
+ --speculator_path=ibm-fms/codellama-13b-accelerator \
127
+ --speculator_source=hf \
128
+ --batch_input \
129
+ --top_k_tokens_per_head=4,3,2,2,2,2,2 \
130
+ --prompt_type=code
131
+ --compile \
132
+ ```
133
+
134
+ Sample code can be found [here](https://github.com/foundation-model-stack/fms-extras/pull/18)