OptimizeLLM
/

Mixtral-8x7B-Instruct-v0.1.q5_k_m

GGUF

conversational

Model card Files Files and versions

xet

Community

OptimizeLLM commited on Feb 24, 2024

Commit

d7d758e

verified ·

1 Parent(s): 65be96b

Update README.md

Browse files

Files changed (1) hide show

README.md +15 -12

README.md CHANGED Viewed

@@ -19,34 +19,37 @@ quantized_by: OptimizeLLM
 This is Mistral AI's Mixtral Instruct v0.1 model, quantized on 02/24/2024. The file size is slightly larger than TheBloke's version from December, and it seems to work well.
-How to quantize your own models on Windows and an RTX GPU:
-Requirements:
-	- git and python installed (if you use oobabooga etc you are probably already good to go)
 The following example starts at the root of D drive and quantizes mistral's Mixtral-9x7B-Instruct-v0.1:
-In Windows command prompt:
-D:
-mkdir Mixtral
-git clone https://github.com/ggerganov/llama.cpp
 Assuming you want CUDA for your NVIDIA RTX GPU(s) use the links below, or grab latest compiled executables from https://github.com/ggerganov/llama.cpp/releases
 Latest version as of Feb 24, 2024:
-https://github.com/ggerganov/llama.cpp/releases/download/b2253/cudart-llama-bin-win-cu12.2.0-x64.zip
-https://github.com/ggerganov/llama.cpp/releases/download/b2253/llama-b2253-bin-win-cublas-cu12.2.0-x64.zip
 Extract the two .zip files directly into the llama.cpp folder you just git cloned. Overwrite files as prompted.
 Download the full-blast version of the model by downloading all .safetensors, .json, and .model files to D:\Mixtral\
 https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
 Download all .safetensors, .json, and .model files to D:\Mixtral\
-Convert the model to fp16:
 D:\llama.cpp>python convert.py D:\Mixtral --outtype f16 --outfile D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.fp16.bin
-Quantize the fp16 model to q5_k_m:
 D:\llama.cpp>quantize.exe D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.fp16.bin D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.q5_k_m.gguf q5_k_m
 That's it. Load up the resulting .gguf file like you normally would.

 This is Mistral AI's Mixtral Instruct v0.1 model, quantized on 02/24/2024. The file size is slightly larger than TheBloke's version from December, and it seems to work well.
+# How to quantize your own models on Windows and an RTX GPU:
+## Requirements:
+* Make sure you have git and python installed (if you use oobabooga etc you are probably good to go)
 The following example starts at the root of D drive and quantizes mistral's Mixtral-9x7B-Instruct-v0.1:
+# Instructions:
+## Windows command prompt
+* D:
+* mkdir Mixtral
+* git clone https://github.com/ggerganov/llama.cpp
+## Download llama.cpp
 Assuming you want CUDA for your NVIDIA RTX GPU(s) use the links below, or grab latest compiled executables from https://github.com/ggerganov/llama.cpp/releases
 Latest version as of Feb 24, 2024:
+* https://github.com/ggerganov/llama.cpp/releases/download/b2253/cudart-llama-bin-win-cu12.2.0-x64.zip
+* https://github.com/ggerganov/llama.cpp/releases/download/b2253/llama-b2253-bin-win-cublas-cu12.2.0-x64.zip
 Extract the two .zip files directly into the llama.cpp folder you just git cloned. Overwrite files as prompted.
+## Download Mixtral
 Download the full-blast version of the model by downloading all .safetensors, .json, and .model files to D:\Mixtral\
 https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
 Download all .safetensors, .json, and .model files to D:\Mixtral\
+## Convert the model to fp16:
 D:\llama.cpp>python convert.py D:\Mixtral --outtype f16 --outfile D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.fp16.bin
+## Quantize the fp16 model to q5_k_m:
 D:\llama.cpp>quantize.exe D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.fp16.bin D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.q5_k_m.gguf q5_k_m
 That's it. Load up the resulting .gguf file like you normally would.