Open-Orca
/

OpenOrcaxOpenChat-Preview2-13B

@@ -22,7 +22,8 @@ This dataset is our attempt to reproduce the dataset generated for Microsoft Res
 This second preview release is trained on a curated filtered subset of most of our GPT4 augmented data.
 This release highlights that our dataset and training methods have surpassed performance parity with the Orca paper.
-As well, this is done with ~1/3rd the compute requirement and using <20% of the dataset size from the original Orca paper.
 We have run extensive evaluations internally and expect this model to place number 1 on both the HuggingFaceH4 Open LLM Leaderboard and the GPT4ALL Leaderboard for 13B models.
@@ -58,7 +59,7 @@ Average for AGIEval: 0.441
 In the Orca paper, they measured their score relative to Vicuna on these evals.
 We've done the same and have found our score averages to >103% of the total improvement that was shown in the Orca paper, using the same evaluation methods as outlined in the paper.
-So we are surpassing Orca performance with <20% of the dataset size and ~1/3rd the training budget!
 ## BigBench-Hard Performance
@@ -82,6 +83,7 @@ We place #1 for all open models and come within comparison of text-davinci-003,
 ![OpenOrca Preview2 GPT4ALL Performance](https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B/resolve/main/OO_Preview2_AGIEval.png "GPT4ALL Performance")
 # Dataset
 We used a curated, filtered selection of most of the GPT-4 augmented data from our OpenOrca dataset, which aims to reproduce the Orca Research Paper dataset.
@@ -90,23 +92,36 @@ Further details of our curation practices will be forthcoming with our full mode
 # Training
-We trained with 8x A100-80G GPUs for 170 hours, completing 5 epochs of full fine tuning on our dataset.
 This contrasts with the 20x A100-80G GPUs for 200 hours used in the Orca paper, for only 3 epochs.
-Our compute requirement was ~1/3rd that of the original Orca.
-Commodity cost was ~$2,300.
 Please await our full releases for further training details.
 # Prompt Template
-We use our own prompt template which we call "``"
 # Serving
 This model is most easily served with [OpenChat's](https://github.com/imoneoi/openchat) customized vLLM OpenAI-compatible API server.
-We also illustrate setup of Oobabooga/text-generation-webui below.
 ## Serving with OpenChat
@@ -128,16 +143,16 @@ You may then connect to the OpenAI-compatible API endpoint with tools such as [B
 ## Serving with Oobabooga / text-generation-webui
 The model may also be loaded via [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui/) in a similar manner to other models.
-See the requirements below.
 ### Oobabooga Key Requirements
-* You will first need to download the model as you normally do to the "`models/`" folder of your text-generation-webui installation.
 * To use the unquantized model presented here, select "`Transformers`"" in the webui's "`Model`" tab "`Model loader`" dropdown.
-  * You will likely want to tick "`auto-devices`". The model will require >30GB VRAM after loading in context for inference.
   * The model was trained in bf16, so tick the "`bf16`" box for best performance.
   * It will run safely on single GPUs with VRAM >=48GB (e.g. A6000)
-    * If using consumer GPUs, e.g. 2x RTX3090 24GB, you will likely want to enter "18,17" under tensor_split to split the model across both GPUs
 * The model will perform significantly better if you use the appropriate prompting template
   * We will submit a PR to include our prompting template into text-generation-webui soon
   * For now, manually enter the settings described in the following sections:
@@ -176,17 +191,19 @@ In the "`Text generation`" tab, select "`instruct`" as the mode:
 It should look as below:
 <img src="https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B/resolve/main/Images/OpenOrcaLlama2OobaboogaInstructMode.png" style="width: 40%">
 # Citation
 ```bibtex
-@software{OpenOrca_Preview2,
-  title = {OpenOrca_Preview2: A Llama2-13B Model Instruct-tuned on Filtered OpenOrcaV1 GPT-4 Dataset},
-  author = {Wing Lian and Bleys Goodson and Guan Wang and Eugene Pentland and Austin Cook and Chanvichet Vong` and "Teknium"},
   year = {2023},
   publisher = {HuggingFace},
   journal = {HuggingFace repository},
-  howpublished = {\url{https://https://huggingface.co/Open-Orca/OpenOrca-Preview2-13B},
 }
 @software{openchat,
   title = {{OpenChat: Advancing Open-source Language Models with Imperfect Data}},

 This second preview release is trained on a curated filtered subset of most of our GPT4 augmented data.
 This release highlights that our dataset and training methods have surpassed performance parity with the Orca paper.
+We measured this with BigBench-Hard and AGIEval results with the same methods as used in the Orca paper, finding ~103% of original Orca's performance on average.
+As well, this is done with ~1/10th the compute requirement and using <20% of the dataset size from the original Orca paper.
 We have run extensive evaluations internally and expect this model to place number 1 on both the HuggingFaceH4 Open LLM Leaderboard and the GPT4ALL Leaderboard for 13B models.
 In the Orca paper, they measured their score relative to Vicuna on these evals.
 We've done the same and have found our score averages to >103% of the total improvement that was shown in the Orca paper, using the same evaluation methods as outlined in the paper.
+So we are surpassing Orca performance with <20% of the dataset size and ~1/10th the training budget!
 ## BigBench-Hard Performance
 ![OpenOrca Preview2 GPT4ALL Performance](https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B/resolve/main/OO_Preview2_AGIEval.png "GPT4ALL Performance")
 # Dataset
 We used a curated, filtered selection of most of the GPT-4 augmented data from our OpenOrca dataset, which aims to reproduce the Orca Research Paper dataset.
 # Training
+We trained with 8x A100-80G GPUs for 46 hours, completing 5 epochs of full fine tuning on our dataset.
 This contrasts with the 20x A100-80G GPUs for 200 hours used in the Orca paper, for only 3 epochs.
+Our compute requirement was <1/10th that of the original Orca.
+Commodity cost was ~$600.
 Please await our full releases for further training details.
 # Prompt Template
+We use our own prompt template which we call "`OpenChat Llama2 V1`"
+Examples:
+```
+# Single-turn V1 Llama 2
+tokenize("User: Hello<|end_of_turn|>Assistant:")
+# Result: [1, 4911, 29901, 15043, 32000, 4007, 22137, 29901]
+# Multi-turn V1 Llama 2
+tokenize("User: Hello<|end_of_turn|>Assistant: Hi<|end_of_turn|>User: How are you today?<|end_of_turn|>Assistant:")
+# Result: [1, 4911, 29901, 15043, 32000, 4007, 22137, 29901, 6324, 32000, 4911, 29901, 1128, 526, 366, 9826, 29973, 32000, 4007, 22137, 29901]
+```
 # Serving
 This model is most easily served with [OpenChat's](https://github.com/imoneoi/openchat) customized vLLM OpenAI-compatible API server.
+This is highly recommended as it is by far the fastest in terms of inference speed and is a quick and easy option for setup.
+We also illustrate setup of Oobabooga/text-generation-webui below. The settings outlined there will also apply to other uses of `Transformers`.
 ## Serving with OpenChat
 ## Serving with Oobabooga / text-generation-webui
 The model may also be loaded via [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui/) in a similar manner to other models.
+See the requirements below. Note that inference with Transformers is significantly slower than using the recommended OpenChat vLLM server.
 ### Oobabooga Key Requirements
+* You will first need to download the model as you normally do to the "`models/`" folder of your `text-generation-webui` installation.
 * To use the unquantized model presented here, select "`Transformers`"" in the webui's "`Model`" tab "`Model loader`" dropdown.
+  * You will likely want to tick "`auto-devices`". The model will require >40GB VRAM after loading in context for inference.
   * The model was trained in bf16, so tick the "`bf16`" box for best performance.
   * It will run safely on single GPUs with VRAM >=48GB (e.g. A6000)
+    * If using consumer GPUs, e.g. 2x RTX3090 24GB, you will likely want to enter "18,17" under "`tensor_split`" to split the model across both GPUs
 * The model will perform significantly better if you use the appropriate prompting template
   * We will submit a PR to include our prompting template into text-generation-webui soon
   * For now, manually enter the settings described in the following sections:
 It should look as below:
 <img src="https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B/resolve/main/Images/OpenOrcaLlama2OobaboogaInstructMode.png" style="width: 40%">
+Then you should be ready to generate!
 # Citation
 ```bibtex
+@software{OpenOrcaxOpenChatPreview2,
+  title = {OpenOrcaxOpenChatPreview2: Llama2-13B Model Instruct-tuned on Filtered OpenOrcaV1 GPT-4 Dataset},
+  author = {Guan Wang and Bleys Goodson and Wing Lian and Eugene Pentland and Austin Cook and Chanvichet Vong and "Teknium"},
   year = {2023},
   publisher = {HuggingFace},
   journal = {HuggingFace repository},
+  howpublished = {\url{https://https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B},
 }
 @software{openchat,
   title = {{OpenChat: Advancing Open-source Language Models with Imperfect Data}},