ekurtic commited on
Commit
e9b0930
·
verified ·
1 Parent(s): a49057b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -1
README.md CHANGED
@@ -15,7 +15,48 @@ tags:
15
  ---
16
 
17
  # DeepSeek-R1-0528-quantized.w4a16
18
- ## More evals coming soon
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  - unquantized baseline on GSM8k
21
  ```bash
 
15
  ---
16
 
17
  # DeepSeek-R1-0528-quantized.w4a16
18
+
19
+ ## Model Overview
20
+ - **Model Architecture:** DeepseekV3ForCausalLM
21
+ - **Input:** Text
22
+ - **Output:** Text
23
+ - **Model Optimizations:**
24
+ - **Activation quantization:** None
25
+ - **Weight quantization:** INT4
26
+ - **Release Date:** 05/30/2025
27
+ - **Version:** 1.0
28
+ - **Model Developers:** Red Hat (Neural Magic)
29
+
30
+
31
+ ### Model Optimizations
32
+
33
+ This model was obtained by quantizing weights of [DeepSeek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528) to INT4 data type.
34
+ This optimization reduces the number of bits used to represent weights from 8 to 4, reducing GPU memory requirements (by approximately 50%).
35
+ Weight quantization also reduces disk size requirements by approximately 50%.
36
+
37
+
38
+ ## Deployment
39
+
40
+ This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend, as shown in the example below.
41
+
42
+ ```python
43
+ from vllm import LLM, SamplingParams
44
+ from transformers import AutoTokenizer
45
+ model_id = "RedHatAI/DeepSeek-R1-0528-quantized.w4a16"
46
+ number_gpus = 8
47
+ sampling_params = SamplingParams(temperature=0.6, top_p=0.95, max_tokens=256)
48
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
49
+ prompt = "Give me a short introduction to large language model."
50
+ llm = LLM(model=model_id, tensor_parallel_size=number_gpus)
51
+ outputs = llm.generate(prompt, sampling_params)
52
+ generated_text = outputs[0].outputs[0].text
53
+ print(generated_text)
54
+ ```
55
+
56
+ vLLM also supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
57
+
58
+
59
+ ## Evaluation (More evals coming soon)
60
 
61
  - unquantized baseline on GSM8k
62
  ```bash