tonyzhao123
/

dummy_llama4

@@ -1,10 +1,11 @@
 ---
 license: apache-2.0
-base_model: meta-llama/Llama-3.2-1B
 tags:
 - fine-tuned
-- knowledge-distillation
-- llama
 language:
 - en
 pipeline_tag: text-generation
@@ -12,40 +13,47 @@ pipeline_tag: text-generation
 # tonyzhao123/dummy_llama4
-Dummy Llama 4 for small size EP debug and dist
 ## Usage
 ```python
-from transformers import AutoTokenizer, AutoModelForCausalLM
 import torch
-# Load model and tokenizer
 model_name = "tonyzhao123/dummy_llama4"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
-model = AutoModelForCausalLM.from_pretrained(
     model_name,
-    torch_dtype=torch.float16,
     device_map="auto"
 )
 # Example usage
-messages = [
-    {"role": "user", "content": "Hello! How are you doing today?"}
-]
-# Apply chat template
-text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
-inputs = tokenizer(text, return_tensors="pt").to(model.device)
-# Generate response
 with torch.no_grad():
     outputs = model.generate(
-        **inputs,
-        max_new_tokens=150,
         do_sample=True,
         temperature=0.7,
-        top_p=0.9,
         pad_token_id=tokenizer.eos_token_id
     )
@@ -53,12 +61,31 @@ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
 print(response)
 ```
 ## Citation
 ```bibtex
-@misc{tonyzhao123/dummy_llama4_2024,
-  title={tonyzhao123/dummy_llama4},
   author={Your Name},
   year={2024},
   publisher={Hugging Face},

 ---
 license: apache-2.0
+base_model: meta-llama/Llama-4-Scout-17B-16E
 tags:
+- llama4
+- checkpoint
 - fine-tuned
+- step-400
 language:
 - en
 pipeline_tag: text-generation
 # tonyzhao123/dummy_llama4
+This is a checkpoint from step 400 of custom Llama4 training.
+## Model Details
+- **Base Model**: meta-llama/Llama-4-Scout-17B-16E
+- **Model Type**: llama4
+- **Architecture**: Llama4ForConditionalGeneration
+- **Training Step**: 400
+- **Source Checkpoint**: `checkpoint-400`
+## Model Configuration
+- **Hidden Size**: 768
+- **Number of Layers**: 8
+- **Number of Experts (MoE)**: 4
+- **Vocabulary Size**: 202048
 ## Usage
 ```python
+from transformers import AutoTokenizer, AutoModelForImageTextToText
 import torch
 model_name = "tonyzhao123/dummy_llama4"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForImageTextToText.from_pretrained(
     model_name,
+    torch_dtype=torch.bfloat16,
     device_map="auto"
 )
 # Example usage
+text = "Hello, how are you today?"
+inputs = tokenizer(text, return_tensors="pt")
 with torch.no_grad():
     outputs = model.generate(
+        inputs.input_ids,
+        max_new_tokens=100,
         do_sample=True,
         temperature=0.7,
         pad_token_id=tokenizer.eos_token_id
     )
 print(response)
 ```
+## Training Information
+This checkpoint was extracted from training step 400. The model was trained using custom scripts with on-the-fly tokenization on WikiText-103 dataset.
+## Files Included
+- `config.json` - Model configuration
+- `model.safetensors` - Model weights (single file, no sharding)
+- `tokenizer.json` - Fast tokenizer
+- `tokenizer_config.json` - Tokenizer configuration
+- `special_tokens_map.json` - Special tokens mapping
+- `generation_config.json` - Generation parameters (if available)
+- `chat_template.jinja` - Chat template (if available)
+## Limitations
+- This is an intermediate checkpoint and may not represent the final trained model
+- Performance may vary depending on the specific training step
+- Always evaluate the model on your specific use case
 ## Citation
 ```bibtex
+@misc{tonyzhao123_dummy_llama4_checkpoint_400,
+  title={tonyzhao123/dummy_llama4 - Checkpoint 400},
   author={Your Name},
   year={2024},
   publisher={Hugging Face},

config.json CHANGED Viewed

@@ -77,7 +77,7 @@
     "rope_theta": 500000.0,
     "router_aux_loss_coef": 0.001,
     "router_jitter_noise": 0.0,
-    "torch_dtype": "bfloat16",
     "use_cache": true,
     "use_qk_norm": true,
     "vocab_size": 202048
@@ -104,6 +104,7 @@
     "projector_input_dim": 768,
     "projector_output_dim": 768,
     "rope_theta": 10000,
     "vision_feature_layer": -1,
     "vision_feature_select_strategy": "default",
     "vision_output_dim": 768

     "rope_theta": 500000.0,
     "router_aux_loss_coef": 0.001,
     "router_jitter_noise": 0.0,
+    "torch_dtype": "float32",
     "use_cache": true,
     "use_qk_norm": true,
     "vocab_size": 202048
     "projector_input_dim": 768,
     "projector_output_dim": 768,
     "rope_theta": 10000,
+    "torch_dtype": "float32",
     "vision_feature_layer": -1,
     "vision_feature_select_strategy": "default",
     "vision_output_dim": 768

tokenizer.json CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:172c9eb4beafc72601690da3ccfcede5c2e6806a8d5ec1fca33e22acea8023a4
-size 27948578

 version https://git-lfs.github.com/spec/v1
+oid sha256:b6cdf15c6af56b42f0ebed8200dbcae60691ea7d58b8b4029ddb4f45599043df
+size 27948867