tonyzhao123/dummy_llama4_b16
This is a checkpoint from step 12136 of custom Llama4 training.
Model Details
- Base Model: meta-llama/Llama-4-Scout-17B-16E
- Model Type: llama4
- Architecture: Llama4ForConditionalGeneration
- Training Step: 12136
- Source Checkpoint:
checkpoint-12136
- Data Type: bfloat16
Model Configuration
- Hidden Size: 768
- Number of Layers: 8
- Number of Experts (MoE): 4
- Vocabulary Size: 202048
Data Type
This model has been converted to bfloat16 format for efficient inference and reduced memory usage.
Usage
from transformers import AutoTokenizer, AutoModelForImageTextToText
import torch
model_name = "tonyzhao123/dummy_llama4_b16"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForImageTextToText.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Example usage
text = "Hello, how are you today?"
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
inputs.input_ids,
max_new_tokens=100,
do_sample=True,
temperature=0.7,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Training Information
This checkpoint was extracted from training step 12136. The model was trained using custom scripts with on-the-fly tokenization on WikiText-103 dataset.
Files Included
config.json
- Model configurationmodel.safetensors
- Model weights (single file, no sharding)tokenizer.json
- Fast tokenizertokenizer_config.json
- Tokenizer configurationspecial_tokens_map.json
- Special tokens mappinggeneration_config.json
- Generation parameters (if available)chat_template.jinja
- Chat template (if available)
Limitations
- This is an intermediate checkpoint and may not represent the final trained model
- Performance may vary depending on the specific training step
- Always evaluate the model on your specific use case
Citation
@misc{tonyzhao123_dummy_llama4_b16_checkpoint_12136,
title={tonyzhao123/dummy_llama4_b16 - Checkpoint 12136},
author={Your Name},
year={2024},
publisher={Hugging Face},
url={https://huggingface.co/tonyzhao123/dummy_llama4_b16}
}
- Downloads last month
- 180
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for tonyzhao123/dummy_llama4_b16
Base model
meta-llama/Llama-4-Scout-17B-16E