tonyzhao123/dummy_llama4_b16

This is a checkpoint from step 12136 of custom Llama4 training.

Model Details

  • Base Model: meta-llama/Llama-4-Scout-17B-16E
  • Model Type: llama4
  • Architecture: Llama4ForConditionalGeneration
  • Training Step: 12136
  • Source Checkpoint: checkpoint-12136
  • Data Type: bfloat16

Model Configuration

  • Hidden Size: 768
  • Number of Layers: 8
  • Number of Experts (MoE): 4
  • Vocabulary Size: 202048

Data Type

This model has been converted to bfloat16 format for efficient inference and reduced memory usage.

Usage

from transformers import AutoTokenizer, AutoModelForImageTextToText
import torch

model_name = "tonyzhao123/dummy_llama4_b16"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForImageTextToText.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Example usage
text = "Hello, how are you today?"
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        inputs.input_ids,
        max_new_tokens=100,
        do_sample=True,
        temperature=0.7,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Information

This checkpoint was extracted from training step 12136. The model was trained using custom scripts with on-the-fly tokenization on WikiText-103 dataset.

Files Included

  • config.json - Model configuration
  • model.safetensors - Model weights (single file, no sharding)
  • tokenizer.json - Fast tokenizer
  • tokenizer_config.json - Tokenizer configuration
  • special_tokens_map.json - Special tokens mapping
  • generation_config.json - Generation parameters (if available)
  • chat_template.jinja - Chat template (if available)

Limitations

  • This is an intermediate checkpoint and may not represent the final trained model
  • Performance may vary depending on the specific training step
  • Always evaluate the model on your specific use case

Citation

@misc{tonyzhao123_dummy_llama4_b16_checkpoint_12136,
  title={tonyzhao123/dummy_llama4_b16 - Checkpoint 12136},
  author={Your Name},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/tonyzhao123/dummy_llama4_b16}
}
Downloads last month
180
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tonyzhao123/dummy_llama4_b16

Finetuned
(19)
this model