File size: 872 Bytes
554962d
 
f962dcd
 
53cc98d
f962dcd
 
 
 
63b00c0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
554962d
120983f
fb390d9
3f3c6ab
 
6fa3d89
5d784be
a9dfd2e
 
8ddbe3c
53cc98d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
---
license: gpl-3.0
metrics:
- perplexity
pipeline_tag: text-generation
tags:
- LLaMa
- text-generation-inference
- ggml
language:
- en
- bg
- ca
- cs
- da
- de
- es
- fr
- hr
- hu
- it
- nl
- pl
- pt
- ro
- ru
- sl
- sr
- sv
- uk
---
NOTE: DEPRECIATED, BETTER PEOPLE DO THIS NOW

LLaMa 65B converted to ggml via LLaMa.cpp, then quantized to 4bit.

Legacy is for llama.cpp setups older than https://github.com/ggerganov/llama.cpp/pull/1508, the regular is faster but does not work on old versions.

I recommend the following settings when running as a good starting point:
```main.exe -m ggml-LLaMa-65B-q4_0.bin -n -1 -t 32 -c 2048 --temp 0.7 --repeat_penalty 1.2 --mirostat 2 --interactive-first --color```

Be aware that LLaMa is a text generation model, not a conversational one, and as such you will have to prompt it differently than, for example, Vicuna or ChatGPT.