continued pretraining of llama3.1 8b on refinedweb for ~80M tokens to try to undo the annealing step and make it act more like an actual base model
From Our Page
community
AI & ML interests
None defined yet.
models
12
from-our-page/hillary-clinton-emails-wikileaks
Updated
from-our-page/BigLlama-3.0-120B
Updated
from-our-page/BigLlama-2-120B
Text Generation
•
120B
•
Updated
•
5
from-our-page/Llama-4-Maverick-17B-128E-mlx-4bit
Text Generation
•
63B
•
Updated
•
174
•
1
from-our-page/llama3.1-8b-refinedbase-checkpoint-5120
8B
•
Updated
•
6
from-our-page/llama3.1-8b-refinedbase-checkpoint-4480
8B
•
Updated
•
5
from-our-page/llama3.1-8b-refinedbase-checkpoint-3840
8B
•
Updated
•
5
from-our-page/llama3.1-8b-refinedbase-checkpoint-3200
8B
•
Updated
•
5
from-our-page/llama3.1-8b-refinedbase-checkpoint-2560
8B
•
Updated
•
5
from-our-page/llama3.1-8b-refinedbase-checkpoint-1920
8B
•
Updated
•
5