Update README.md
Browse files
README.md
CHANGED
|
@@ -4,4 +4,40 @@ datasets:
|
|
| 4 |
- Photolens/alpaca-cleaned-airoboros-2.1-no-code-oasst1-en-merged
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
- Photolens/alpaca-cleaned-airoboros-2.1-no-code-oasst1-en-merged
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## Model overview
|
| 10 |
+
This model is finetuned on *[a merged dataset of: oasst1-en, alpaca-cleaned and airoboros-2.1-no-code](https://huggingface.co/datasets/Photolens/alpaca-cleaned-airoboros-2.1-no-code-oasst1-en-merged)* on a base model: *[Marx-3b-V2](https://huggingface.co/acrastt/Marx-3B-V2)*
|
| 11 |
+
- License: "`Creative-Commons-Attribution-4.0`"
|
| 12 |
+
- Language: "`en`"
|
| 13 |
+
- Size: "`3.43b params`"
|
| 14 |
+
|
| 15 |
+
## Prompt template
|
| 16 |
+
Prompt template:
|
| 17 |
+
```
|
| 18 |
+
### SYSTEM:
|
| 19 |
+
<system_prompt_here>
|
| 20 |
+
|
| 21 |
+
### HUMAN:
|
| 22 |
+
<prompter_message_here>
|
| 23 |
+
|
| 24 |
+
### INPUT:
|
| 25 |
+
<input_text_here>
|
| 26 |
+
|
| 27 |
+
### RESPONSE:
|
| 28 |
+
<leave_a_blank_line_here>
|
| 29 |
+
```
|
| 30 |
+
*Note: If you dont have a system or input text, do not include the tokens in the prompt.*
|
| 31 |
+
|
| 32 |
+
## Training Details
|
| 33 |
+
This model took `2:40:54` to train in LoRA on a single `A100 40gb` GPU.<br>
|
| 34 |
+
- *epochs*: `1`
|
| 35 |
+
- *train batch size*: `8`
|
| 36 |
+
- *eval batch size*: `8`
|
| 37 |
+
- *gradient accumulation steps*: `1`
|
| 38 |
+
- *maximum gradient normal*: `0.3`
|
| 39 |
+
- *learning rate*: `2e-4`
|
| 40 |
+
- *weight decay*: `0.001`
|
| 41 |
+
- *optimizer*: `paged_adamw_32bit`
|
| 42 |
+
- *learning rate schedule*: `cosine`
|
| 43 |
+
- *warmup ratio (linear)*: `0.03`
|