alexmarques commited on
Commit
f33c6f8
·
verified ·
1 Parent(s): a6e7ead

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -30,7 +30,7 @@ tags:
30
 
31
  ### Model Optimizations
32
 
33
- This model was obtained by quantizing the weights of [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) to FP8 data type.
34
  This optimization reduces the number of bits used to represent weights and activations from 16 to 8, reducing GPU memory requirements (by approximately 50%) and increasing matrix-multiply compute throughput (by approximately 2x).
35
  Weight quantization also reduces disk size requirements by approximately 50%.
36
 
 
30
 
31
  ### Model Optimizations
32
 
33
+ This model was obtained by quantizing activations and weights of [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) to FP8 data type.
34
  This optimization reduces the number of bits used to represent weights and activations from 16 to 8, reducing GPU memory requirements (by approximately 50%) and increasing matrix-multiply compute throughput (by approximately 2x).
35
  Weight quantization also reduces disk size requirements by approximately 50%.
36