Update README.md
Browse files
README.md
CHANGED
|
@@ -16,3 +16,37 @@ prompt_template: '[INST] {prompt} [/INST]
|
|
| 16 |
'
|
| 17 |
quantized_by: OptimizeLLM
|
| 18 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
'
|
| 17 |
quantized_by: OptimizeLLM
|
| 18 |
---
|
| 19 |
+
|
| 20 |
+
This is Mistral AI's Mixtral Instruct v0.1 model, quantized on 02/24/2024. The file size is slightly larger than TheBloke's version from December, and it seems to work well.
|
| 21 |
+
|
| 22 |
+
How to quantize your own models on Windows and an RTX GPU:
|
| 23 |
+
|
| 24 |
+
Requirements:
|
| 25 |
+
- git and python installed (if you use oobabooga etc you are probably already good to go)
|
| 26 |
+
|
| 27 |
+
The following example starts at the root of D drive and quantizes mistral's Mixtral-9x7B-Instruct-v0.1:
|
| 28 |
+
|
| 29 |
+
In Windows command prompt:
|
| 30 |
+
D:
|
| 31 |
+
mkdir Mixtral
|
| 32 |
+
git clone https://github.com/ggerganov/llama.cpp
|
| 33 |
+
|
| 34 |
+
Assuming you want CUDA for your NVIDIA RTX GPU(s) use the links below, or grab latest compiled executables from https://github.com/ggerganov/llama.cpp/releases
|
| 35 |
+
Latest version as of Feb 24, 2024:
|
| 36 |
+
https://github.com/ggerganov/llama.cpp/releases/download/b2253/cudart-llama-bin-win-cu12.2.0-x64.zip
|
| 37 |
+
https://github.com/ggerganov/llama.cpp/releases/download/b2253/llama-b2253-bin-win-cublas-cu12.2.0-x64.zip
|
| 38 |
+
|
| 39 |
+
Extract the two .zip files directly into the llama.cpp folder you just git cloned. Overwrite files as prompted.
|
| 40 |
+
|
| 41 |
+
Download the full-blast version of the model by downloading all .safetensors, .json, and .model files to D:\Mixtral\
|
| 42 |
+
https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
|
| 43 |
+
Download all .safetensors, .json, and .model files to D:\Mixtral\
|
| 44 |
+
|
| 45 |
+
Convert the model to fp16:
|
| 46 |
+
D:\llama.cpp>python convert.py D:\Mixtral --outtype f16 --outfile D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.fp16.bin
|
| 47 |
+
|
| 48 |
+
Quantize the fp16 model to q5_k_m:
|
| 49 |
+
|
| 50 |
+
D:\llama.cpp>quantize.exe D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.fp16.bin D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.q5_k_m.gguf q5_k_m
|
| 51 |
+
|
| 52 |
+
That's it. Load up the resulting .gguf file like you normally would.
|