ibm-ai-platform
/

codellama-13b-accelerator

Model card Files Files and versions

JRosenkranz commited on Apr 20, 2024

Commit

1143687

·

verified ·

1 Parent(s): 3491c38

Update README.md

Files changed (1) hide show

README.md +1 -58

README.md CHANGED Viewed

@@ -70,61 +70,4 @@ pip install . --no-cache-dir
 python sample_client.py
 ```
-_Note: first prompt may be slower as there is a slight warmup time_
-### Minimal Sample
-*To try this out with the fms-native compiled model, please execute the following:*
-#### Install
-```bash
-git clone https://github.com/foundation-model-stack/fms-extras
-(cd fms-extras && pip install -e .)
-pip install transformers==4.35.0 sentencepiece numpy
-```
-#### Run Sample
-##### batch_size=1 (compile + cudagraphs)
-```bash
-python fms-extras/scripts/paged_speculative_inference.py \
-    --variant=13b \
-    --model_path=/path/to/model_weights/llama/codellama-13B-F \
-    --model_source=hf \
-    --tokenizer=/path/to/llama/13B-F \
-    --speculator_path=ibm-fms/codellama-13b-accelerator \
-    --speculator_source=hf \
-    --compile \
-    --compile_mode=reduce-overhead
-```
-##### batch_size=1 (compile)
-```bash
-python fms-extras/scripts/paged_speculative_inference.py \
-    --variant=13b \
-    --model_path=/path/to/model_weights/llama/codellama-13B-F \
-    --model_source=hf \
-    --tokenizer=/path/to/llama/13B-F \
-    --speculator_path=ibm-fms/codellama-13b-accelerator \
-    --speculator_source=hf \
-    --compile \
-```
-##### batch_size=4 (compile)
-```bash
-python fms-extras/scripts/paged_speculative_inference.py \
-    --variant=13b \
-    --model_path=/path/to/model_weights/llama/codellama-13B-F \
-    --model_source=hf \
-    --tokenizer=/path/to/llama/13B-F \
-    --speculator_path=ibm-fms/codellama-13b-accelerator \
-    --speculator_source=hf \
-    --batch_input \
-    --compile \
-```
-Sample code can be found [here](https://github.com/foundation-model-stack/fms-extras/blob/main/scripts/paged_speculative_inference.py)

 python sample_client.py
 ```
+_Note: first prompt may be slower as there is a slight warmup time_