OptimizeLLM commited on
Commit
d7d758e
·
verified ·
1 Parent(s): 65be96b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -12
README.md CHANGED
@@ -19,34 +19,37 @@ quantized_by: OptimizeLLM
19
 
20
  This is Mistral AI's Mixtral Instruct v0.1 model, quantized on 02/24/2024. The file size is slightly larger than TheBloke's version from December, and it seems to work well.
21
 
22
- How to quantize your own models on Windows and an RTX GPU:
23
 
24
- Requirements:
25
- - git and python installed (if you use oobabooga etc you are probably already good to go)
26
 
27
  The following example starts at the root of D drive and quantizes mistral's Mixtral-9x7B-Instruct-v0.1:
28
 
29
- In Windows command prompt:
30
- D:
31
- mkdir Mixtral
32
- git clone https://github.com/ggerganov/llama.cpp
33
 
 
 
 
 
 
 
34
  Assuming you want CUDA for your NVIDIA RTX GPU(s) use the links below, or grab latest compiled executables from https://github.com/ggerganov/llama.cpp/releases
35
  Latest version as of Feb 24, 2024:
36
- https://github.com/ggerganov/llama.cpp/releases/download/b2253/cudart-llama-bin-win-cu12.2.0-x64.zip
37
- https://github.com/ggerganov/llama.cpp/releases/download/b2253/llama-b2253-bin-win-cublas-cu12.2.0-x64.zip
38
 
39
  Extract the two .zip files directly into the llama.cpp folder you just git cloned. Overwrite files as prompted.
40
 
 
41
  Download the full-blast version of the model by downloading all .safetensors, .json, and .model files to D:\Mixtral\
42
  https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
43
  Download all .safetensors, .json, and .model files to D:\Mixtral\
44
 
45
- Convert the model to fp16:
46
  D:\llama.cpp>python convert.py D:\Mixtral --outtype f16 --outfile D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.fp16.bin
47
 
48
- Quantize the fp16 model to q5_k_m:
49
-
50
  D:\llama.cpp>quantize.exe D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.fp16.bin D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.q5_k_m.gguf q5_k_m
51
 
52
  That's it. Load up the resulting .gguf file like you normally would.
 
19
 
20
  This is Mistral AI's Mixtral Instruct v0.1 model, quantized on 02/24/2024. The file size is slightly larger than TheBloke's version from December, and it seems to work well.
21
 
22
+ # How to quantize your own models on Windows and an RTX GPU:
23
 
24
+ ## Requirements:
25
+ * Make sure you have git and python installed (if you use oobabooga etc you are probably good to go)
26
 
27
  The following example starts at the root of D drive and quantizes mistral's Mixtral-9x7B-Instruct-v0.1:
28
 
29
+ # Instructions:
 
 
 
30
 
31
+ ## Windows command prompt
32
+ * D:
33
+ * mkdir Mixtral
34
+ * git clone https://github.com/ggerganov/llama.cpp
35
+
36
+ ## Download llama.cpp
37
  Assuming you want CUDA for your NVIDIA RTX GPU(s) use the links below, or grab latest compiled executables from https://github.com/ggerganov/llama.cpp/releases
38
  Latest version as of Feb 24, 2024:
39
+ * https://github.com/ggerganov/llama.cpp/releases/download/b2253/cudart-llama-bin-win-cu12.2.0-x64.zip
40
+ * https://github.com/ggerganov/llama.cpp/releases/download/b2253/llama-b2253-bin-win-cublas-cu12.2.0-x64.zip
41
 
42
  Extract the two .zip files directly into the llama.cpp folder you just git cloned. Overwrite files as prompted.
43
 
44
+ ## Download Mixtral
45
  Download the full-blast version of the model by downloading all .safetensors, .json, and .model files to D:\Mixtral\
46
  https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
47
  Download all .safetensors, .json, and .model files to D:\Mixtral\
48
 
49
+ ## Convert the model to fp16:
50
  D:\llama.cpp>python convert.py D:\Mixtral --outtype f16 --outfile D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.fp16.bin
51
 
52
+ ## Quantize the fp16 model to q5_k_m:
 
53
  D:\llama.cpp>quantize.exe D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.fp16.bin D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.q5_k_m.gguf q5_k_m
54
 
55
  That's it. Load up the resulting .gguf file like you normally would.