OptimizeLLM commited on
Commit
65be96b
·
verified ·
1 Parent(s): 4efdb9f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -0
README.md CHANGED
@@ -16,3 +16,37 @@ prompt_template: '[INST] {prompt} [/INST]
16
  '
17
  quantized_by: OptimizeLLM
18
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  '
17
  quantized_by: OptimizeLLM
18
  ---
19
+
20
+ This is Mistral AI's Mixtral Instruct v0.1 model, quantized on 02/24/2024. The file size is slightly larger than TheBloke's version from December, and it seems to work well.
21
+
22
+ How to quantize your own models on Windows and an RTX GPU:
23
+
24
+ Requirements:
25
+ - git and python installed (if you use oobabooga etc you are probably already good to go)
26
+
27
+ The following example starts at the root of D drive and quantizes mistral's Mixtral-9x7B-Instruct-v0.1:
28
+
29
+ In Windows command prompt:
30
+ D:
31
+ mkdir Mixtral
32
+ git clone https://github.com/ggerganov/llama.cpp
33
+
34
+ Assuming you want CUDA for your NVIDIA RTX GPU(s) use the links below, or grab latest compiled executables from https://github.com/ggerganov/llama.cpp/releases
35
+ Latest version as of Feb 24, 2024:
36
+ https://github.com/ggerganov/llama.cpp/releases/download/b2253/cudart-llama-bin-win-cu12.2.0-x64.zip
37
+ https://github.com/ggerganov/llama.cpp/releases/download/b2253/llama-b2253-bin-win-cublas-cu12.2.0-x64.zip
38
+
39
+ Extract the two .zip files directly into the llama.cpp folder you just git cloned. Overwrite files as prompted.
40
+
41
+ Download the full-blast version of the model by downloading all .safetensors, .json, and .model files to D:\Mixtral\
42
+ https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
43
+ Download all .safetensors, .json, and .model files to D:\Mixtral\
44
+
45
+ Convert the model to fp16:
46
+ D:\llama.cpp>python convert.py D:\Mixtral --outtype f16 --outfile D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.fp16.bin
47
+
48
+ Quantize the fp16 model to q5_k_m:
49
+
50
+ D:\llama.cpp>quantize.exe D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.fp16.bin D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.q5_k_m.gguf q5_k_m
51
+
52
+ That's it. Load up the resulting .gguf file like you normally would.