Update README.md
Browse files
README.md
CHANGED
|
@@ -22,7 +22,8 @@ This dataset is our attempt to reproduce the dataset generated for Microsoft Res
|
|
| 22 |
This second preview release is trained on a curated filtered subset of most of our GPT4 augmented data.
|
| 23 |
|
| 24 |
This release highlights that our dataset and training methods have surpassed performance parity with the Orca paper.
|
| 25 |
-
|
|
|
|
| 26 |
|
| 27 |
We have run extensive evaluations internally and expect this model to place number 1 on both the HuggingFaceH4 Open LLM Leaderboard and the GPT4ALL Leaderboard for 13B models.
|
| 28 |
|
|
@@ -58,7 +59,7 @@ Average for AGIEval: 0.441
|
|
| 58 |
In the Orca paper, they measured their score relative to Vicuna on these evals.
|
| 59 |
We've done the same and have found our score averages to >103% of the total improvement that was shown in the Orca paper, using the same evaluation methods as outlined in the paper.
|
| 60 |
|
| 61 |
-
So we are surpassing Orca performance with <20% of the dataset size and ~1/
|
| 62 |
|
| 63 |
## BigBench-Hard Performance
|
| 64 |
|
|
@@ -82,6 +83,7 @@ We place #1 for all open models and come within comparison of text-davinci-003,
|
|
| 82 |
|
| 83 |

|
| 84 |
|
|
|
|
| 85 |
# Dataset
|
| 86 |
|
| 87 |
We used a curated, filtered selection of most of the GPT-4 augmented data from our OpenOrca dataset, which aims to reproduce the Orca Research Paper dataset.
|
|
@@ -90,23 +92,36 @@ Further details of our curation practices will be forthcoming with our full mode
|
|
| 90 |
|
| 91 |
# Training
|
| 92 |
|
| 93 |
-
We trained with 8x A100-80G GPUs for
|
| 94 |
This contrasts with the 20x A100-80G GPUs for 200 hours used in the Orca paper, for only 3 epochs.
|
| 95 |
-
Our compute requirement was
|
| 96 |
-
Commodity cost was ~$
|
| 97 |
|
| 98 |
Please await our full releases for further training details.
|
| 99 |
|
| 100 |
|
| 101 |
# Prompt Template
|
| 102 |
|
| 103 |
-
We use our own prompt template which we call "
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
|
| 105 |
|
| 106 |
# Serving
|
| 107 |
|
| 108 |
This model is most easily served with [OpenChat's](https://github.com/imoneoi/openchat) customized vLLM OpenAI-compatible API server.
|
| 109 |
-
|
|
|
|
| 110 |
|
| 111 |
|
| 112 |
## Serving with OpenChat
|
|
@@ -128,16 +143,16 @@ You may then connect to the OpenAI-compatible API endpoint with tools such as [B
|
|
| 128 |
## Serving with Oobabooga / text-generation-webui
|
| 129 |
|
| 130 |
The model may also be loaded via [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui/) in a similar manner to other models.
|
| 131 |
-
See the requirements below.
|
| 132 |
|
| 133 |
### Oobabooga Key Requirements
|
| 134 |
|
| 135 |
-
* You will first need to download the model as you normally do to the "`models/`" folder of your text-generation-webui installation.
|
| 136 |
* To use the unquantized model presented here, select "`Transformers`"" in the webui's "`Model`" tab "`Model loader`" dropdown.
|
| 137 |
-
* You will likely want to tick "`auto-devices`". The model will require >
|
| 138 |
* The model was trained in bf16, so tick the "`bf16`" box for best performance.
|
| 139 |
* It will run safely on single GPUs with VRAM >=48GB (e.g. A6000)
|
| 140 |
-
* If using consumer GPUs, e.g. 2x RTX3090 24GB, you will likely want to enter "18,17" under tensor_split to split the model across both GPUs
|
| 141 |
* The model will perform significantly better if you use the appropriate prompting template
|
| 142 |
* We will submit a PR to include our prompting template into text-generation-webui soon
|
| 143 |
* For now, manually enter the settings described in the following sections:
|
|
@@ -176,17 +191,19 @@ In the "`Text generation`" tab, select "`instruct`" as the mode:
|
|
| 176 |
It should look as below:
|
| 177 |
<img src="https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B/resolve/main/Images/OpenOrcaLlama2OobaboogaInstructMode.png" style="width: 40%">
|
| 178 |
|
|
|
|
|
|
|
| 179 |
|
| 180 |
# Citation
|
| 181 |
|
| 182 |
```bibtex
|
| 183 |
-
@software{
|
| 184 |
-
title = {
|
| 185 |
-
author = {
|
| 186 |
year = {2023},
|
| 187 |
publisher = {HuggingFace},
|
| 188 |
journal = {HuggingFace repository},
|
| 189 |
-
howpublished = {\url{https://https://huggingface.co/Open-Orca/
|
| 190 |
}
|
| 191 |
@software{openchat,
|
| 192 |
title = {{OpenChat: Advancing Open-source Language Models with Imperfect Data}},
|
|
|
|
| 22 |
This second preview release is trained on a curated filtered subset of most of our GPT4 augmented data.
|
| 23 |
|
| 24 |
This release highlights that our dataset and training methods have surpassed performance parity with the Orca paper.
|
| 25 |
+
We measured this with BigBench-Hard and AGIEval results with the same methods as used in the Orca paper, finding ~103% of original Orca's performance on average.
|
| 26 |
+
As well, this is done with ~1/10th the compute requirement and using <20% of the dataset size from the original Orca paper.
|
| 27 |
|
| 28 |
We have run extensive evaluations internally and expect this model to place number 1 on both the HuggingFaceH4 Open LLM Leaderboard and the GPT4ALL Leaderboard for 13B models.
|
| 29 |
|
|
|
|
| 59 |
In the Orca paper, they measured their score relative to Vicuna on these evals.
|
| 60 |
We've done the same and have found our score averages to >103% of the total improvement that was shown in the Orca paper, using the same evaluation methods as outlined in the paper.
|
| 61 |
|
| 62 |
+
So we are surpassing Orca performance with <20% of the dataset size and ~1/10th the training budget!
|
| 63 |
|
| 64 |
## BigBench-Hard Performance
|
| 65 |
|
|
|
|
| 83 |
|
| 84 |

|
| 85 |
|
| 86 |
+
|
| 87 |
# Dataset
|
| 88 |
|
| 89 |
We used a curated, filtered selection of most of the GPT-4 augmented data from our OpenOrca dataset, which aims to reproduce the Orca Research Paper dataset.
|
|
|
|
| 92 |
|
| 93 |
# Training
|
| 94 |
|
| 95 |
+
We trained with 8x A100-80G GPUs for 46 hours, completing 5 epochs of full fine tuning on our dataset.
|
| 96 |
This contrasts with the 20x A100-80G GPUs for 200 hours used in the Orca paper, for only 3 epochs.
|
| 97 |
+
Our compute requirement was <1/10th that of the original Orca.
|
| 98 |
+
Commodity cost was ~$600.
|
| 99 |
|
| 100 |
Please await our full releases for further training details.
|
| 101 |
|
| 102 |
|
| 103 |
# Prompt Template
|
| 104 |
|
| 105 |
+
We use our own prompt template which we call "`OpenChat Llama2 V1`"
|
| 106 |
+
|
| 107 |
+
|
| 108 |
+
Examples:
|
| 109 |
+
```
|
| 110 |
+
# Single-turn V1 Llama 2
|
| 111 |
+
tokenize("User: Hello<|end_of_turn|>Assistant:")
|
| 112 |
+
# Result: [1, 4911, 29901, 15043, 32000, 4007, 22137, 29901]
|
| 113 |
+
|
| 114 |
+
# Multi-turn V1 Llama 2
|
| 115 |
+
tokenize("User: Hello<|end_of_turn|>Assistant: Hi<|end_of_turn|>User: How are you today?<|end_of_turn|>Assistant:")
|
| 116 |
+
# Result: [1, 4911, 29901, 15043, 32000, 4007, 22137, 29901, 6324, 32000, 4911, 29901, 1128, 526, 366, 9826, 29973, 32000, 4007, 22137, 29901]
|
| 117 |
+
```
|
| 118 |
|
| 119 |
|
| 120 |
# Serving
|
| 121 |
|
| 122 |
This model is most easily served with [OpenChat's](https://github.com/imoneoi/openchat) customized vLLM OpenAI-compatible API server.
|
| 123 |
+
This is highly recommended as it is by far the fastest in terms of inference speed and is a quick and easy option for setup.
|
| 124 |
+
We also illustrate setup of Oobabooga/text-generation-webui below. The settings outlined there will also apply to other uses of `Transformers`.
|
| 125 |
|
| 126 |
|
| 127 |
## Serving with OpenChat
|
|
|
|
| 143 |
## Serving with Oobabooga / text-generation-webui
|
| 144 |
|
| 145 |
The model may also be loaded via [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui/) in a similar manner to other models.
|
| 146 |
+
See the requirements below. Note that inference with Transformers is significantly slower than using the recommended OpenChat vLLM server.
|
| 147 |
|
| 148 |
### Oobabooga Key Requirements
|
| 149 |
|
| 150 |
+
* You will first need to download the model as you normally do to the "`models/`" folder of your `text-generation-webui` installation.
|
| 151 |
* To use the unquantized model presented here, select "`Transformers`"" in the webui's "`Model`" tab "`Model loader`" dropdown.
|
| 152 |
+
* You will likely want to tick "`auto-devices`". The model will require >40GB VRAM after loading in context for inference.
|
| 153 |
* The model was trained in bf16, so tick the "`bf16`" box for best performance.
|
| 154 |
* It will run safely on single GPUs with VRAM >=48GB (e.g. A6000)
|
| 155 |
+
* If using consumer GPUs, e.g. 2x RTX3090 24GB, you will likely want to enter "18,17" under "`tensor_split`" to split the model across both GPUs
|
| 156 |
* The model will perform significantly better if you use the appropriate prompting template
|
| 157 |
* We will submit a PR to include our prompting template into text-generation-webui soon
|
| 158 |
* For now, manually enter the settings described in the following sections:
|
|
|
|
| 191 |
It should look as below:
|
| 192 |
<img src="https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B/resolve/main/Images/OpenOrcaLlama2OobaboogaInstructMode.png" style="width: 40%">
|
| 193 |
|
| 194 |
+
Then you should be ready to generate!
|
| 195 |
+
|
| 196 |
|
| 197 |
# Citation
|
| 198 |
|
| 199 |
```bibtex
|
| 200 |
+
@software{OpenOrcaxOpenChatPreview2,
|
| 201 |
+
title = {OpenOrcaxOpenChatPreview2: Llama2-13B Model Instruct-tuned on Filtered OpenOrcaV1 GPT-4 Dataset},
|
| 202 |
+
author = {Guan Wang and Bleys Goodson and Wing Lian and Eugene Pentland and Austin Cook and Chanvichet Vong and "Teknium"},
|
| 203 |
year = {2023},
|
| 204 |
publisher = {HuggingFace},
|
| 205 |
journal = {HuggingFace repository},
|
| 206 |
+
howpublished = {\url{https://https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B},
|
| 207 |
}
|
| 208 |
@software{openchat,
|
| 209 |
title = {{OpenChat: Advancing Open-source Language Models with Imperfect Data}},
|