A collection of pre-training datasets samples of sizes 10M, 100M and 1B tokens. Ideal for use in quick experimentation and ablations.
Asankhaya Sharma
codelion
AI & ML interests
Creator of OptiLLM, OpenEvolve, Adaptive Classifier, and Ellora. Pioneering a new category in AI infrastructure: inference-time compute for LLMs.
Recent Activity
updated
a model
3 days ago
codelion/Qwen3-4B-Instruct-2507-self-verify-lora
published
a model
3 days ago
codelion/Qwen3-4B-Instruct-2507-self-verify-lora
updated
a Space
6 days ago
codelion/pts-visualizer