File size: 1,867 Bytes
7ceb51c 4efdb9f 7ceb51c 4efdb9f 7ceb51c 65be96b 194d9eb 65be96b 0711a5e 65be96b d7d758e 1386567 65be96b d7d758e 1386567 65be96b 99a726e d7d758e 65be96b 7e61312 d7d758e 65be96b d7d758e 3007636 65be96b 99a726e 3007636 65be96b 99a726e 3007636 65be96b bde709a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
---
base_model: mistralai/Mixtral-8x7B-Instruct-v0.1
inference: false
language:
- fr
- it
- de
- es
- en
license: apache-2.0
model_creator: Mistral AI_
model_name: Mixtral 8X7B Instruct v0.1
model_type: mixtral
prompt_template: '[INST] {prompt} [/INST]
'
quantized_by: OptimizeLLM
---
This is Mistral AI's Mixtral Instruct v0.1 model, quantized on 02/24/2024. It works well.
# How to quantize your own models with Windows and an RTX GPU:
## Requirements:
* git
* python
# Instructions:
The following example starts at the root of D drive and quantizes mistral's Mixtral-9x7B-Instruct-v0.1.
## Windows command prompt - folder setup and git clone llama.cpp
* D:
* mkdir Mixtral
* git clone https://github.com/ggerganov/llama.cpp
## Download llama.cpp
Assuming you want CUDA for your NVIDIA RTX GPU(s) use the links below, or grab latest compiled executables from https://github.com/ggerganov/llama.cpp/releases
### Latest version as of Feb 24, 2024:
* https://github.com/ggerganov/llama.cpp/releases/download/b2253/cudart-llama-bin-win-cu12.2.0-x64.zip
* https://github.com/ggerganov/llama.cpp/releases/download/b2253/llama-b2253-bin-win-cublas-cu12.2.0-x64.zip
Extract the two .zip files directly into the llama.cpp folder you just git cloned. Overwrite files as prompted.
## Download Mixtral
* Download the full-blast version of the model by downloading all .safetensors, .json, and .model files to D:\Mixtral\:
* https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
## Windows command prompt - Convert the model to fp16:
* D:\llama.cpp>python convert.py D:\Mixtral --outtype f16 --outfile D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.fp16.bin
## Windows command prompt - Quantize the fp16 model to q5_k_m:
* D:\llama.cpp>quantize.exe D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.fp16.bin D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.q5_k_m.gguf q5_k_m
That's it!
|