File size: 648 Bytes
925f477
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
---
base_model:
- google/gemma-3-12b-it
---

About 13.5M tokens total of mixed instruct and RP data. 

Both RP datasets and the inkstruct include system prompts to help g3 understand the system role (via `<start_of_turn>system`). 

```yaml
datasets:
  - path: ToastyPigeon/some-rp-extended
    type: customgemma-regex
  - path: allura-org/inkstructmix-v0.2.1a-system-reasoning-separated
    type: customgemma-regex
    data_files: inkstruct-system.json
    split: train[:750]
  - path: ToastyPigeon/unalign-v2
    type: customgemma-regex
    split: train[:50%]
  - path: ToastyPigeon/synth-rp
    split: train[:20%]
    type: customgemma-regex
```