Commit
·
568787c
1
Parent(s):
7d073b9
add improved inference options
Browse files
README.md
CHANGED
|
@@ -26,16 +26,30 @@ Available models:
|
|
| 26 |
|
| 27 |
## Inference with Google Colab and HuggingFace 🤗
|
| 28 |
|
| 29 |
-
|
| 30 |
-
|
| 31 |
You will be able to run inference using a free Colab notebook if you select a gpu runtime. See the notebook for more details.
|
| 32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
## Licensing and Usage
|
| 34 |
|
| 35 |
fLlama-7B:
|
| 36 |
- Llama 2 license
|
| 37 |
|
| 38 |
fLlama-13B:
|
|
|
|
| 39 |
- Purchase acess here: [fLlama-13b: €19.99 per user/seat.](https://buy.stripe.com/9AQ7te3lHdmbdZ68wz)
|
| 40 |
|
| 41 |
- Licenses are not transferable to other users/entities.
|
|
|
|
| 26 |
|
| 27 |
## Inference with Google Colab and HuggingFace 🤗
|
| 28 |
|
| 29 |
+
**GPTQ (fastest + good accuracy)**
|
| 30 |
+
Get started by saving your own copy of this [function calling chatbot](https://colab.research.google.com/drive/1u8x41Jx8WWtI-nzHOgqTxkS3Q_lcjaSX?usp=sharing).
|
| 31 |
You will be able to run inference using a free Colab notebook if you select a gpu runtime. See the notebook for more details.
|
| 32 |
|
| 33 |
+
**Bits and Bytes NF4 (slowest inference)**
|
| 34 |
+
Try out this notebook [fLlama_Inference notebook](https://colab.research.google.com/drive/1Ow5cQ0JNv-vXsT-apCceH6Na3b4L7JyW?usp=sharing)
|
| 35 |
+
|
| 36 |
+
**GGML (best for running on a laptop, great for Mac)**
|
| 37 |
+
To run this you'll need to install llamaccp from ggerganov on github.
|
| 38 |
+
- Download the ggml file from the ggml link above, under available models
|
| 39 |
+
- I recommend running a command like:
|
| 40 |
+
|
| 41 |
+
```
|
| 42 |
+
./server -m fLlama-2-7b-chat.ggmlv3.q3_K_M.bin -ngl 32 -c 2048
|
| 43 |
+
```
|
| 44 |
+
which will allow you to run a chatbot in your browser. The -ngl offloads layers to the Mac's GPU and gets very good token generation speed.
|
| 45 |
+
|
| 46 |
## Licensing and Usage
|
| 47 |
|
| 48 |
fLlama-7B:
|
| 49 |
- Llama 2 license
|
| 50 |
|
| 51 |
fLlama-13B:
|
| 52 |
+
- For higher precision on function calling.
|
| 53 |
- Purchase acess here: [fLlama-13b: €19.99 per user/seat.](https://buy.stripe.com/9AQ7te3lHdmbdZ68wz)
|
| 54 |
|
| 55 |
- Licenses are not transferable to other users/entities.
|