Update README.md
Browse files
README.md
CHANGED
@@ -6,7 +6,8 @@ pipeline_tag: text-generation
|
|
6 |
|
7 |
## Model Details
|
8 |
|
9 |
-
This model is an mixed int4 model with group_size 128 and symmetric quantization of [deepseek-ai/DeepSeek-V3.1-Base](https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base) generated by [intel/auto-round](https://github.com/intel/auto-round)
|
|
|
10 |
Please follow the license of the original model.
|
11 |
|
12 |
## How To Use
|
@@ -70,16 +71,15 @@ Some of the most popular adventures include:
|
|
70 |
```
|
71 |
|
72 |
### Generate the model
|
|
|
73 |
```python
|
74 |
import torch
|
75 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
76 |
import transformers
|
|
|
77 |
|
78 |
model_name = "deepseek-ai/DeepSeek-V3.1-Base"
|
79 |
|
80 |
-
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
81 |
-
model = AutoModelForCausalLM.from_pretrained(model_name,device_map="cpu", torch_dtype="auto"
|
82 |
-
|
83 |
layer_config = {}
|
84 |
for n, m in model.named_modules():
|
85 |
if isinstance(m, torch.nn.Linear):
|
@@ -90,9 +90,7 @@ for n, m in model.named_modules():
|
|
90 |
layer_config[n] = {"bits": 8}
|
91 |
print(n, 8)
|
92 |
|
93 |
-
|
94 |
-
|
95 |
-
autoround = AutoRound(model, tokenizer, iters=0, layer_config=layer_config)
|
96 |
autoround.quantize_and_save(format="auto_round", output_dir="tmp_autoround")
|
97 |
|
98 |
```
|
|
|
6 |
|
7 |
## Model Details
|
8 |
|
9 |
+
This model is an mixed int4 model with group_size 128 and symmetric quantization of [deepseek-ai/DeepSeek-V3.1-Base](https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base) generated by [intel/auto-round](https://github.com/intel/auto-round) via RTN(no algorithm tuning).
|
10 |
+
Non expert layers are fallback to 8 bits. Please refer to Section Generate the model for more details.
|
11 |
Please follow the license of the original model.
|
12 |
|
13 |
## How To Use
|
|
|
71 |
```
|
72 |
|
73 |
### Generate the model
|
74 |
+
this pr is required if the model is fp8 and the device supports fp8 https://github.com/intel/auto-round/pull/750
|
75 |
```python
|
76 |
import torch
|
77 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
78 |
import transformers
|
79 |
+
from auto_round import AutoRound
|
80 |
|
81 |
model_name = "deepseek-ai/DeepSeek-V3.1-Base"
|
82 |
|
|
|
|
|
|
|
83 |
layer_config = {}
|
84 |
for n, m in model.named_modules():
|
85 |
if isinstance(m, torch.nn.Linear):
|
|
|
90 |
layer_config[n] = {"bits": 8}
|
91 |
print(n, 8)
|
92 |
|
93 |
+
autoround = AutoRound(model_name,, iters=0, layer_config=layer_config)
|
|
|
|
|
94 |
autoround.quantize_and_save(format="auto_round", output_dir="tmp_autoround")
|
95 |
|
96 |
```
|