Update README.md

Browse files

Files changed (1) hide show

README.md +0 -14

README.md CHANGED Viewed

@@ -101,11 +101,6 @@ print(f"{save_to} model:", benchmark_fn(quantized_model.generate, **inputs, max_
 # Model Quality
 We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
-## Installing the nightly version to get most recent updates
-```
-pip install git+https://github.com/EleutherAI/lm-evaluation-harness
-```
 ## baseline
 ```
 lm_eval --model hf --model_args pretrained=microsoft/Phi-4-mini-instruct --tasks hellaswag --device cuda:0 --batch_size 8
@@ -116,9 +111,6 @@ lm_eval --model hf --model_args pretrained=microsoft/Phi-4-mini-instruct --tasks
 lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-float8dq --tasks hellaswag --device cuda:0 --batch_size 8
 ```
-`TODO: more complete eval results`
 | Benchmark                        |                |                     |
 |----------------------------------|----------------|---------------------|
 |                                  | Phi-4 mini-Ins | phi4-mini-int4wo    |
@@ -155,12 +147,6 @@ lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-float8dq
 Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
-## Download vllm source code and install vllm
-```
-git clone [email protected]:vllm-project/vllm.git
-VLLM_USE_PRECOMPILED=1 pip install .
-```
 ## Download dataset
 Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json`

 # Model Quality
 We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
 ## baseline
 ```
 lm_eval --model hf --model_args pretrained=microsoft/Phi-4-mini-instruct --tasks hellaswag --device cuda:0 --batch_size 8
 lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-float8dq --tasks hellaswag --device cuda:0 --batch_size 8
 ```
 | Benchmark                        |                |                     |
 |----------------------------------|----------------|---------------------|
 |                                  | Phi-4 mini-Ins | phi4-mini-int4wo    |
 Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
 ## Download dataset
 Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json`