|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: mit |
|
|
model-index: |
|
|
- name: Nano-Llama |
|
|
results: [] |
|
|
tags: |
|
|
- pytorch |
|
|
- causal-lm |
|
|
- text-generation |
|
|
- fineweb |
|
|
datasets: |
|
|
- HuggingFaceFW/fineweb |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# Nano-Llama |
|
|
|
|
|
A compact 67M parameter LLaMA-2-style language model pretrained on FineWeb dataset. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Architecture**: LLaMA-2-style transformer |
|
|
- **Parameters**: 678M |
|
|
- **Training Data**: FineWeb dataset (~100M tokens) |
|
|
- **Context Length**: 1024 tokens |
|
|
- **Layers**: 6 |
|
|
- **Hidden Size**: 768 |
|
|
- **Attention Heads**: 12 |
|
|
|
|
|
## Training |
|
|
|
|
|
- **Dataset**: FineWeb (web-crawled high-quality text) |
|
|
- **Tokens Trained**: ~110M tokens |
|
|
- **Training Time**: ~6 hours on RTX 3090 |
|
|
- **Optimizer**: AdamW |
|
|
- **Learning Rate**: 1e-4 |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("vishesh-t27/Nano-Llama") |
|
|
model = AutoModelForCausalLM.from_pretrained("vishesh-t27/Nano-Llama") |
|
|
|
|
|
model.eval() |
|
|
|
|
|
# Test prompt |
|
|
text = "The future of artificial intelligence is" |
|
|
inputs = tokenizer(text, return_tensors="pt") |
|
|
|
|
|
# Generate text |
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=50, |
|
|
temperature=0.8, |
|
|
do_sample=True, |
|
|
pad_token_id=tokenizer.eos_token_id |
|
|
) |
|
|
|
|
|
# Decode and print |
|
|
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
print(generated_text) |
|
|
``` |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Small model size (67M parameters) |
|
|
- Limited training data compared to larger models |
|
|
- May generate repetitive or nonsensical text |
|
|
|
|
|
## License |
|
|
|
|
|
MIT License |