tflsxyy commited on
Commit
c738d90
·
verified ·
1 Parent(s): 1982203

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -0
README.md ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - deepseek-ai/DeepSeek-V3
4
+ ---
5
+ This is the first 4 layers of DeepSeek-V3 with GPTQ quantization style.
6
+ - Layer 4's routed experts are quantized to 2-bit
7
+ - All other Linear layers are quantized to 4-bit (including MLA, dense layer ffn, and shared expert)
8
+
9
+ To load and run this model:
10
+ ```python
11
+ from transformers import AutoModelForCausalLM, AutoTokenizer
12
+ from gptqmodel import GPTQModel, QuantizeConfig, get_best_device
13
+
14
+ pretrained_model_id = "/root/dataDisk/DeepSeek-V3-bf16-4layers"
15
+ quantized_model_id = = "/root/dataDisk/DeepSeek-V3-4bit-4layers"
16
+
17
+ tokenizer = AutoTokenizer.from_pretrained(pretrained_model_id, use_fast=True)
18
+ device = get_best_device()
19
+ model = GPTQModel.load(quantized_model_id, device=device, trust_remote_code=True)
20
+ print(tokenizer.decode(model.generate(**tokenizer("gptqmodel is", return_tensors="pt").to(model.device), max_new_tokens=10)[0]))
21
+ ```