|
|
--- |
|
|
base_model: |
|
|
- google/gemma-3-12b-it |
|
|
--- |
|
|
|
|
|
About 13.5M tokens total of mixed instruct and RP data. |
|
|
|
|
|
Both RP datasets and the inkstruct include system prompts to help g3 understand the system role (via `<start_of_turn>system`). |
|
|
|
|
|
```yaml |
|
|
datasets: |
|
|
- path: ToastyPigeon/some-rp-extended |
|
|
type: customgemma-regex |
|
|
- path: allura-org/inkstructmix-v0.2.1a-system-reasoning-separated |
|
|
type: customgemma-regex |
|
|
data_files: inkstruct-system.json |
|
|
split: train[:750] |
|
|
- path: ToastyPigeon/unalign-v2 |
|
|
type: customgemma-regex |
|
|
split: train[:50%] |
|
|
- path: ToastyPigeon/synth-rp |
|
|
split: train[:20%] |
|
|
type: customgemma-regex |
|
|
``` |
|
|
|