nm-research commited on
Commit
657d47b
·
verified ·
1 Parent(s): 0ff72a0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -38,7 +38,7 @@ It was evaluated on a several tasks to assess the its quality in comparison to t
38
  ### Model Optimizations
39
 
40
  This model was obtained by quantizing the weights and activations of [Meta-Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.3-70B-Instruct) to FP4 data type, ready for inference with vLLM>=0.9.1
41
- This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 25%.
42
 
43
  Only the weights and activations of the linear operators within transformers blocks are quantized using [LLM Compressor](https://github.com/vllm-project/llm-compressor).
44
 
 
38
  ### Model Optimizations
39
 
40
  This model was obtained by quantizing the weights and activations of [Meta-Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.3-70B-Instruct) to FP4 data type, ready for inference with vLLM>=0.9.1
41
+ This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 75%.
42
 
43
  Only the weights and activations of the linear operators within transformers blocks are quantized using [LLM Compressor](https://github.com/vllm-project/llm-compressor).
44