library_name: transformers | |
license: mit | |
base_model: | |
- meta-llama/Meta-Llama-3-8B | |
# Model Details | |
`meta-llama/Meta-Llama-3-8B` model finetuned on 100,000 [CLRS-Text](https://github.com/google-deepmind/clrs/tree/master/clrs/_src/clrs_text) examples. | |
## Training Details | |
- Learning Rate: 1e-4, 150 warmup steps then cosine decayed to 5e-06 using AdamW optimiser | |
- Batch size: 128 | |
- Loss taken over answer only, not on question. | |