tonyzhao123/dummy_llama4_b16

This is a checkpoint from step 12136 of custom Llama4 training.

Model Details

Base Model: meta-llama/Llama-4-Scout-17B-16E
Model Type: llama4
Architecture: Llama4ForConditionalGeneration
Training Step: 12136
Source Checkpoint: checkpoint-12136
Data Type: bfloat16

Model Configuration

Hidden Size: 768
Number of Layers: 8
Number of Experts (MoE): 4
Vocabulary Size: 202048

Data Type

This model has been converted to bfloat16 format for efficient inference and reduced memory usage.

Usage

from transformers import AutoTokenizer, AutoModelForImageTextToText
import torch

model_name = "tonyzhao123/dummy_llama4_b16"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForImageTextToText.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Example usage
text = "Hello, how are you today?"
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        inputs.input_ids,
        max_new_tokens=100,
        do_sample=True,
        temperature=0.7,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Information

This checkpoint was extracted from training step 12136. The model was trained using custom scripts with on-the-fly tokenization on WikiText-103 dataset.

Files Included

config.json - Model configuration
model.safetensors - Model weights (single file, no sharding)
tokenizer.json - Fast tokenizer
tokenizer_config.json - Tokenizer configuration
special_tokens_map.json - Special tokens mapping
generation_config.json - Generation parameters (if available)
chat_template.jinja - Chat template (if available)

Limitations

This is an intermediate checkpoint and may not represent the final trained model
Performance may vary depending on the specific training step
Always evaluate the model on your specific use case

Citation

@misc{tonyzhao123_dummy_llama4_b16_checkpoint_12136,
  title={tonyzhao123/dummy_llama4_b16 - Checkpoint 12136},
  author={Your Name},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/tonyzhao123/dummy_llama4_b16}
}

tonyzhao123
/

dummy_llama4_b16