Update README.md
Browse files
README.md
CHANGED
@@ -70,61 +70,4 @@ pip install . --no-cache-dir
|
|
70 |
python sample_client.py
|
71 |
```
|
72 |
|
73 |
-
_Note: first prompt may be slower as there is a slight warmup time_
|
74 |
-
|
75 |
-
### Minimal Sample
|
76 |
-
|
77 |
-
*To try this out with the fms-native compiled model, please execute the following:*
|
78 |
-
|
79 |
-
#### Install
|
80 |
-
|
81 |
-
```bash
|
82 |
-
git clone https://github.com/foundation-model-stack/fms-extras
|
83 |
-
(cd fms-extras && pip install -e .)
|
84 |
-
pip install transformers==4.35.0 sentencepiece numpy
|
85 |
-
```
|
86 |
-
|
87 |
-
#### Run Sample
|
88 |
-
|
89 |
-
##### batch_size=1 (compile + cudagraphs)
|
90 |
-
|
91 |
-
```bash
|
92 |
-
python fms-extras/scripts/paged_speculative_inference.py \
|
93 |
-
--variant=13b \
|
94 |
-
--model_path=/path/to/model_weights/llama/codellama-13B-F \
|
95 |
-
--model_source=hf \
|
96 |
-
--tokenizer=/path/to/llama/13B-F \
|
97 |
-
--speculator_path=ibm-fms/codellama-13b-accelerator \
|
98 |
-
--speculator_source=hf \
|
99 |
-
--compile \
|
100 |
-
--compile_mode=reduce-overhead
|
101 |
-
```
|
102 |
-
|
103 |
-
##### batch_size=1 (compile)
|
104 |
-
|
105 |
-
```bash
|
106 |
-
python fms-extras/scripts/paged_speculative_inference.py \
|
107 |
-
--variant=13b \
|
108 |
-
--model_path=/path/to/model_weights/llama/codellama-13B-F \
|
109 |
-
--model_source=hf \
|
110 |
-
--tokenizer=/path/to/llama/13B-F \
|
111 |
-
--speculator_path=ibm-fms/codellama-13b-accelerator \
|
112 |
-
--speculator_source=hf \
|
113 |
-
--compile \
|
114 |
-
```
|
115 |
-
|
116 |
-
##### batch_size=4 (compile)
|
117 |
-
|
118 |
-
```bash
|
119 |
-
python fms-extras/scripts/paged_speculative_inference.py \
|
120 |
-
--variant=13b \
|
121 |
-
--model_path=/path/to/model_weights/llama/codellama-13B-F \
|
122 |
-
--model_source=hf \
|
123 |
-
--tokenizer=/path/to/llama/13B-F \
|
124 |
-
--speculator_path=ibm-fms/codellama-13b-accelerator \
|
125 |
-
--speculator_source=hf \
|
126 |
-
--batch_input \
|
127 |
-
--compile \
|
128 |
-
```
|
129 |
-
|
130 |
-
Sample code can be found [here](https://github.com/foundation-model-stack/fms-extras/blob/main/scripts/paged_speculative_inference.py)
|
|
|
70 |
python sample_client.py
|
71 |
```
|
72 |
|
73 |
+
_Note: first prompt may be slower as there is a slight warmup time_
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|