Update README.md
Browse files
README.md
CHANGED
|
@@ -31,7 +31,14 @@ tags:
|
|
| 31 |
**Acknowledgement**:
|
| 32 |
We sincerely thank our community members, [Mingyuan](https://huggingface.co/ThunderBeee) and [Zoey](https://huggingface.co/ZY6), for their extraordinary contributions to this quantization effort. Please explore [Octopus-v4](https://huggingface.co/NexaAIDev/Octopus-v4) for our original huggingface model.
|
| 33 |
|
| 34 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
1. **Clone and compile:**
|
| 37 |
|
|
@@ -42,49 +49,56 @@ cd llama.cpp
|
|
| 42 |
make
|
| 43 |
```
|
| 44 |
|
| 45 |
-
2. **
|
| 46 |
-
|
| 47 |
-
Navigate to the `prompt` folder inside the `llama.cpp`, and create a new file named `chat-with-octopus.txt`.
|
| 48 |
|
| 49 |
-
|
| 50 |
|
| 51 |
```bash
|
| 52 |
-
|
| 53 |
```
|
| 54 |
-
|
| 55 |
-
3. **Execute the Model:**
|
| 56 |
|
| 57 |
-
Run
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
|
| 59 |
```bash
|
| 60 |
-
|
| 61 |
```
|
| 62 |
|
| 63 |
-
|
| 64 |
```bash
|
| 65 |
-
|
| 66 |
```
|
| 67 |
|
| 68 |
-
|
| 69 |
-
1. Create a `Modelfile` in your directory and include a `FROM` statement with the path to your local model:
|
| 70 |
```bash
|
| 71 |
-
|
| 72 |
-
```
|
|
|
|
|
|
|
| 73 |
|
| 74 |
-
2. Use the following command to add the model to Ollama:
|
| 75 |
```bash
|
| 76 |
-
|
| 77 |
PARAMETER temperature 0
|
| 78 |
PARAMETER num_ctx 1024
|
| 79 |
PARAMETER stop <nexa_end>
|
| 80 |
```
|
| 81 |
|
| 82 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
```bash
|
| 84 |
ollama ls
|
| 85 |
```
|
| 86 |
|
| 87 |
-
|
| 88 |
```bash
|
| 89 |
ollama run octopus-v4-Q4_K_M "<|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>"
|
| 90 |
```
|
|
@@ -119,4 +133,4 @@ ollama run octopus-v4-Q4_K_M "<|system|>You are a router. Below is the query fro
|
|
| 119 |
| Octopus-v4-Q8_0.gguf | Q8_0 | 8 | 4.06 GB | 50.10 | very large, good quality |
|
| 120 |
| Octopus-v4-f16.gguf | f16 | 16 | 7.64 GB | 30.61 | extremely large |
|
| 121 |
|
| 122 |
-
_Quantized with llama.cpp_
|
|
|
|
| 31 |
**Acknowledgement**:
|
| 32 |
We sincerely thank our community members, [Mingyuan](https://huggingface.co/ThunderBeee) and [Zoey](https://huggingface.co/ZY6), for their extraordinary contributions to this quantization effort. Please explore [Octopus-v4](https://huggingface.co/NexaAIDev/Octopus-v4) for our original huggingface model.
|
| 33 |
|
| 34 |
+
|
| 35 |
+
## Get Started
|
| 36 |
+
To run the models, please download them to your local machine using either git clone or [Hugging Face Hub](https://huggingface.co/docs/huggingface_hub/en/guides/download)
|
| 37 |
+
```
|
| 38 |
+
git clone https://huggingface.co/NexaAIDev/octopus-v4-gguf
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
## Run with [llama.cpp](https://github.com/ggerganov/llama.cpp) (Recommended)
|
| 42 |
|
| 43 |
1. **Clone and compile:**
|
| 44 |
|
|
|
|
| 49 |
make
|
| 50 |
```
|
| 51 |
|
| 52 |
+
2. **Execute the Model:**
|
|
|
|
|
|
|
| 53 |
|
| 54 |
+
Run the following command in the terminal:
|
| 55 |
|
| 56 |
```bash
|
| 57 |
+
./main -m ./path/to/octopus-v4-Q4_K_M.gguf -n 256 -p "<|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>"
|
| 58 |
```
|
|
|
|
|
|
|
| 59 |
|
| 60 |
+
## Run with [Ollama](https://github.com/ollama/ollama)
|
| 61 |
+
|
| 62 |
+
Since our models have not been uploaded to the Ollama server, please download the models and manually import them into Ollama by following these steps:
|
| 63 |
+
|
| 64 |
+
1. Install Ollama on your local machine. You can also following the guide from [Ollama GitHub repository](https://github.com/ollama/ollama/blob/main/docs/import.md)
|
| 65 |
|
| 66 |
```bash
|
| 67 |
+
git clone https://github.com/ollama/ollama.git ollama
|
| 68 |
```
|
| 69 |
|
| 70 |
+
2. Locate the local Ollama directory:
|
| 71 |
```bash
|
| 72 |
+
cd ollama
|
| 73 |
```
|
| 74 |
|
| 75 |
+
3. Create a `Modelfile` in your directory
|
|
|
|
| 76 |
```bash
|
| 77 |
+
touch Modelfile
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
+
4. In the Modelfile, include a `FROM` statement with the path to your local model, and the default parameters:
|
| 81 |
|
|
|
|
| 82 |
```bash
|
| 83 |
+
FROM ./path/to/octopus-v4-Q4_K_M.gguf
|
| 84 |
PARAMETER temperature 0
|
| 85 |
PARAMETER num_ctx 1024
|
| 86 |
PARAMETER stop <nexa_end>
|
| 87 |
```
|
| 88 |
|
| 89 |
+
5. Use the following command to add the model to Ollama:
|
| 90 |
+
|
| 91 |
+
```bash
|
| 92 |
+
ollama create octopus-v4-Q4_K_M -f Modelfile
|
| 93 |
+
```
|
| 94 |
+
|
| 95 |
+
6. Verify that the model has been successfully imported:
|
| 96 |
+
|
| 97 |
```bash
|
| 98 |
ollama ls
|
| 99 |
```
|
| 100 |
|
| 101 |
+
7. Run the model
|
| 102 |
```bash
|
| 103 |
ollama run octopus-v4-Q4_K_M "<|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>"
|
| 104 |
```
|
|
|
|
| 133 |
| Octopus-v4-Q8_0.gguf | Q8_0 | 8 | 4.06 GB | 50.10 | very large, good quality |
|
| 134 |
| Octopus-v4-f16.gguf | f16 | 16 | 7.64 GB | 30.61 | extremely large |
|
| 135 |
|
| 136 |
+
_Quantized with llama.cpp_
|