Update README.md
Browse files
README.md
CHANGED
@@ -19,34 +19,37 @@ quantized_by: OptimizeLLM
|
|
19 |
|
20 |
This is Mistral AI's Mixtral Instruct v0.1 model, quantized on 02/24/2024. The file size is slightly larger than TheBloke's version from December, and it seems to work well.
|
21 |
|
22 |
-
How to quantize your own models on Windows and an RTX GPU:
|
23 |
|
24 |
-
Requirements:
|
25 |
-
|
26 |
|
27 |
The following example starts at the root of D drive and quantizes mistral's Mixtral-9x7B-Instruct-v0.1:
|
28 |
|
29 |
-
|
30 |
-
D:
|
31 |
-
mkdir Mixtral
|
32 |
-
git clone https://github.com/ggerganov/llama.cpp
|
33 |
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
Assuming you want CUDA for your NVIDIA RTX GPU(s) use the links below, or grab latest compiled executables from https://github.com/ggerganov/llama.cpp/releases
|
35 |
Latest version as of Feb 24, 2024:
|
36 |
-
https://github.com/ggerganov/llama.cpp/releases/download/b2253/cudart-llama-bin-win-cu12.2.0-x64.zip
|
37 |
-
https://github.com/ggerganov/llama.cpp/releases/download/b2253/llama-b2253-bin-win-cublas-cu12.2.0-x64.zip
|
38 |
|
39 |
Extract the two .zip files directly into the llama.cpp folder you just git cloned. Overwrite files as prompted.
|
40 |
|
|
|
41 |
Download the full-blast version of the model by downloading all .safetensors, .json, and .model files to D:\Mixtral\
|
42 |
https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
|
43 |
Download all .safetensors, .json, and .model files to D:\Mixtral\
|
44 |
|
45 |
-
Convert the model to fp16:
|
46 |
D:\llama.cpp>python convert.py D:\Mixtral --outtype f16 --outfile D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.fp16.bin
|
47 |
|
48 |
-
Quantize the fp16 model to q5_k_m:
|
49 |
-
|
50 |
D:\llama.cpp>quantize.exe D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.fp16.bin D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.q5_k_m.gguf q5_k_m
|
51 |
|
52 |
That's it. Load up the resulting .gguf file like you normally would.
|
|
|
19 |
|
20 |
This is Mistral AI's Mixtral Instruct v0.1 model, quantized on 02/24/2024. The file size is slightly larger than TheBloke's version from December, and it seems to work well.
|
21 |
|
22 |
+
# How to quantize your own models on Windows and an RTX GPU:
|
23 |
|
24 |
+
## Requirements:
|
25 |
+
* Make sure you have git and python installed (if you use oobabooga etc you are probably good to go)
|
26 |
|
27 |
The following example starts at the root of D drive and quantizes mistral's Mixtral-9x7B-Instruct-v0.1:
|
28 |
|
29 |
+
# Instructions:
|
|
|
|
|
|
|
30 |
|
31 |
+
## Windows command prompt
|
32 |
+
* D:
|
33 |
+
* mkdir Mixtral
|
34 |
+
* git clone https://github.com/ggerganov/llama.cpp
|
35 |
+
|
36 |
+
## Download llama.cpp
|
37 |
Assuming you want CUDA for your NVIDIA RTX GPU(s) use the links below, or grab latest compiled executables from https://github.com/ggerganov/llama.cpp/releases
|
38 |
Latest version as of Feb 24, 2024:
|
39 |
+
* https://github.com/ggerganov/llama.cpp/releases/download/b2253/cudart-llama-bin-win-cu12.2.0-x64.zip
|
40 |
+
* https://github.com/ggerganov/llama.cpp/releases/download/b2253/llama-b2253-bin-win-cublas-cu12.2.0-x64.zip
|
41 |
|
42 |
Extract the two .zip files directly into the llama.cpp folder you just git cloned. Overwrite files as prompted.
|
43 |
|
44 |
+
## Download Mixtral
|
45 |
Download the full-blast version of the model by downloading all .safetensors, .json, and .model files to D:\Mixtral\
|
46 |
https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
|
47 |
Download all .safetensors, .json, and .model files to D:\Mixtral\
|
48 |
|
49 |
+
## Convert the model to fp16:
|
50 |
D:\llama.cpp>python convert.py D:\Mixtral --outtype f16 --outfile D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.fp16.bin
|
51 |
|
52 |
+
## Quantize the fp16 model to q5_k_m:
|
|
|
53 |
D:\llama.cpp>quantize.exe D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.fp16.bin D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.q5_k_m.gguf q5_k_m
|
54 |
|
55 |
That's it. Load up the resulting .gguf file like you normally would.
|