File size: 1,867 Bytes
7ceb51c
4efdb9f
 
 
 
 
 
 
 
7ceb51c
4efdb9f
 
 
 
 
 
 
7ceb51c
65be96b
194d9eb
65be96b
0711a5e
65be96b
d7d758e
1386567
 
65be96b
d7d758e
1386567
65be96b
99a726e
d7d758e
 
 
 
 
65be96b
7e61312
 
d7d758e
 
65be96b
 
 
d7d758e
3007636
 
 
65be96b
99a726e
3007636
65be96b
99a726e
3007636
65be96b
bde709a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
base_model: mistralai/Mixtral-8x7B-Instruct-v0.1
inference: false
language:
- fr
- it
- de
- es
- en
license: apache-2.0
model_creator: Mistral AI_
model_name: Mixtral 8X7B Instruct v0.1
model_type: mixtral
prompt_template: '[INST] {prompt} [/INST]

  '
quantized_by: OptimizeLLM
---

This is Mistral AI's Mixtral Instruct v0.1 model, quantized on 02/24/2024. It works well.

# How to quantize your own models with Windows and an RTX GPU:

## Requirements:
* git 
* python

# Instructions:
The following example starts at the root of D drive and quantizes mistral's Mixtral-9x7B-Instruct-v0.1.

## Windows command prompt - folder setup and git clone llama.cpp
* D:
* mkdir Mixtral
* git clone https://github.com/ggerganov/llama.cpp

## Download llama.cpp
Assuming you want CUDA for your NVIDIA RTX GPU(s) use the links below, or grab latest compiled executables from https://github.com/ggerganov/llama.cpp/releases

### Latest version as of Feb 24, 2024:
* https://github.com/ggerganov/llama.cpp/releases/download/b2253/cudart-llama-bin-win-cu12.2.0-x64.zip
* https://github.com/ggerganov/llama.cpp/releases/download/b2253/llama-b2253-bin-win-cublas-cu12.2.0-x64.zip

Extract the two .zip files directly into the llama.cpp folder you just git cloned. Overwrite files as prompted.

## Download Mixtral
* Download the full-blast version of the model by downloading all .safetensors, .json, and .model files to D:\Mixtral\:
* https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1


## Windows command prompt - Convert the model to fp16:
* D:\llama.cpp>python convert.py D:\Mixtral --outtype f16 --outfile D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.fp16.bin

## Windows command prompt - Quantize the fp16 model to q5_k_m:
* D:\llama.cpp>quantize.exe D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.fp16.bin D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.q5_k_m.gguf q5_k_m

That's it!