Update README.md
Browse files
README.md
CHANGED
|
@@ -1,6 +1,8 @@
|
|
| 1 |
---
|
| 2 |
library_name: transformers
|
| 3 |
-
tags:
|
|
|
|
|
|
|
| 4 |
---
|
| 5 |
|
| 6 |
[Phi4-mini](https://huggingface.co/microsoft/Phi-4-mini-instruct) model quantized with [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao) int4 weight only quantization, by PyTorch team.
|
|
@@ -146,4 +148,4 @@ python benchmarks/benchmark_serving.py --backend vllm --dataset-name sharegpt --
|
|
| 146 |
We can use the same command we used in serving benchmarks to serve the model with vllm
|
| 147 |
```
|
| 148 |
vllm serve jerryzh168/phi4-mini-int4wo-hqq --tokenizer microsoft/Phi-4-mini-instruct -O3
|
| 149 |
-
```
|
|
|
|
| 1 |
---
|
| 2 |
library_name: transformers
|
| 3 |
+
tags:
|
| 4 |
+
- torchao
|
| 5 |
+
license: mit
|
| 6 |
---
|
| 7 |
|
| 8 |
[Phi4-mini](https://huggingface.co/microsoft/Phi-4-mini-instruct) model quantized with [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao) int4 weight only quantization, by PyTorch team.
|
|
|
|
| 148 |
We can use the same command we used in serving benchmarks to serve the model with vllm
|
| 149 |
```
|
| 150 |
vllm serve jerryzh168/phi4-mini-int4wo-hqq --tokenizer microsoft/Phi-4-mini-instruct -O3
|
| 151 |
+
```
|