JRosenkranz commited on
Commit
1143687
·
verified ·
1 Parent(s): 3491c38

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -58
README.md CHANGED
@@ -70,61 +70,4 @@ pip install . --no-cache-dir
70
  python sample_client.py
71
  ```
72
 
73
- _Note: first prompt may be slower as there is a slight warmup time_
74
-
75
- ### Minimal Sample
76
-
77
- *To try this out with the fms-native compiled model, please execute the following:*
78
-
79
- #### Install
80
-
81
- ```bash
82
- git clone https://github.com/foundation-model-stack/fms-extras
83
- (cd fms-extras && pip install -e .)
84
- pip install transformers==4.35.0 sentencepiece numpy
85
- ```
86
-
87
- #### Run Sample
88
-
89
- ##### batch_size=1 (compile + cudagraphs)
90
-
91
- ```bash
92
- python fms-extras/scripts/paged_speculative_inference.py \
93
- --variant=13b \
94
- --model_path=/path/to/model_weights/llama/codellama-13B-F \
95
- --model_source=hf \
96
- --tokenizer=/path/to/llama/13B-F \
97
- --speculator_path=ibm-fms/codellama-13b-accelerator \
98
- --speculator_source=hf \
99
- --compile \
100
- --compile_mode=reduce-overhead
101
- ```
102
-
103
- ##### batch_size=1 (compile)
104
-
105
- ```bash
106
- python fms-extras/scripts/paged_speculative_inference.py \
107
- --variant=13b \
108
- --model_path=/path/to/model_weights/llama/codellama-13B-F \
109
- --model_source=hf \
110
- --tokenizer=/path/to/llama/13B-F \
111
- --speculator_path=ibm-fms/codellama-13b-accelerator \
112
- --speculator_source=hf \
113
- --compile \
114
- ```
115
-
116
- ##### batch_size=4 (compile)
117
-
118
- ```bash
119
- python fms-extras/scripts/paged_speculative_inference.py \
120
- --variant=13b \
121
- --model_path=/path/to/model_weights/llama/codellama-13B-F \
122
- --model_source=hf \
123
- --tokenizer=/path/to/llama/13B-F \
124
- --speculator_path=ibm-fms/codellama-13b-accelerator \
125
- --speculator_source=hf \
126
- --batch_input \
127
- --compile \
128
- ```
129
-
130
- Sample code can be found [here](https://github.com/foundation-model-stack/fms-extras/blob/main/scripts/paged_speculative_inference.py)
 
70
  python sample_client.py
71
  ```
72
 
73
+ _Note: first prompt may be slower as there is a slight warmup time_