tonyzhao123 commited on
Commit
96cb4aa
·
1 Parent(s): 6ec8c1d

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +48 -21
  2. config.json +2 -1
  3. tokenizer.json +2 -2
README.md CHANGED
@@ -1,10 +1,11 @@
1
  ---
2
  license: apache-2.0
3
- base_model: meta-llama/Llama-3.2-1B
4
  tags:
 
 
5
  - fine-tuned
6
- - knowledge-distillation
7
- - llama
8
  language:
9
  - en
10
  pipeline_tag: text-generation
@@ -12,40 +13,47 @@ pipeline_tag: text-generation
12
 
13
  # tonyzhao123/dummy_llama4
14
 
15
- Dummy Llama 4 for small size EP debug and dist
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
  ## Usage
18
 
19
  ```python
20
- from transformers import AutoTokenizer, AutoModelForCausalLM
21
  import torch
22
 
23
- # Load model and tokenizer
24
  model_name = "tonyzhao123/dummy_llama4"
25
  tokenizer = AutoTokenizer.from_pretrained(model_name)
26
- model = AutoModelForCausalLM.from_pretrained(
27
  model_name,
28
- torch_dtype=torch.float16,
29
  device_map="auto"
30
  )
31
 
32
  # Example usage
33
- messages = [
34
- {"role": "user", "content": "Hello! How are you doing today?"}
35
- ]
36
-
37
- # Apply chat template
38
- text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
39
- inputs = tokenizer(text, return_tensors="pt").to(model.device)
40
 
41
- # Generate response
42
  with torch.no_grad():
43
  outputs = model.generate(
44
- **inputs,
45
- max_new_tokens=150,
46
  do_sample=True,
47
  temperature=0.7,
48
- top_p=0.9,
49
  pad_token_id=tokenizer.eos_token_id
50
  )
51
 
@@ -53,12 +61,31 @@ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
53
  print(response)
54
  ```
55
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
  ## Citation
58
 
59
  ```bibtex
60
- @misc{tonyzhao123/dummy_llama4_2024,
61
- title={tonyzhao123/dummy_llama4},
62
  author={Your Name},
63
  year={2024},
64
  publisher={Hugging Face},
 
1
  ---
2
  license: apache-2.0
3
+ base_model: meta-llama/Llama-4-Scout-17B-16E
4
  tags:
5
+ - llama4
6
+ - checkpoint
7
  - fine-tuned
8
+ - step-400
 
9
  language:
10
  - en
11
  pipeline_tag: text-generation
 
13
 
14
  # tonyzhao123/dummy_llama4
15
 
16
+ This is a checkpoint from step 400 of custom Llama4 training.
17
+
18
+ ## Model Details
19
+
20
+ - **Base Model**: meta-llama/Llama-4-Scout-17B-16E
21
+ - **Model Type**: llama4
22
+ - **Architecture**: Llama4ForConditionalGeneration
23
+ - **Training Step**: 400
24
+ - **Source Checkpoint**: `checkpoint-400`
25
+
26
+ ## Model Configuration
27
+
28
+ - **Hidden Size**: 768
29
+ - **Number of Layers**: 8
30
+ - **Number of Experts (MoE)**: 4
31
+ - **Vocabulary Size**: 202048
32
 
33
  ## Usage
34
 
35
  ```python
36
+ from transformers import AutoTokenizer, AutoModelForImageTextToText
37
  import torch
38
 
 
39
  model_name = "tonyzhao123/dummy_llama4"
40
  tokenizer = AutoTokenizer.from_pretrained(model_name)
41
+ model = AutoModelForImageTextToText.from_pretrained(
42
  model_name,
43
+ torch_dtype=torch.bfloat16,
44
  device_map="auto"
45
  )
46
 
47
  # Example usage
48
+ text = "Hello, how are you today?"
49
+ inputs = tokenizer(text, return_tensors="pt")
 
 
 
 
 
50
 
 
51
  with torch.no_grad():
52
  outputs = model.generate(
53
+ inputs.input_ids,
54
+ max_new_tokens=100,
55
  do_sample=True,
56
  temperature=0.7,
 
57
  pad_token_id=tokenizer.eos_token_id
58
  )
59
 
 
61
  print(response)
62
  ```
63
 
64
+ ## Training Information
65
+
66
+ This checkpoint was extracted from training step 400. The model was trained using custom scripts with on-the-fly tokenization on WikiText-103 dataset.
67
+
68
+ ## Files Included
69
+
70
+ - `config.json` - Model configuration
71
+ - `model.safetensors` - Model weights (single file, no sharding)
72
+ - `tokenizer.json` - Fast tokenizer
73
+ - `tokenizer_config.json` - Tokenizer configuration
74
+ - `special_tokens_map.json` - Special tokens mapping
75
+ - `generation_config.json` - Generation parameters (if available)
76
+ - `chat_template.jinja` - Chat template (if available)
77
+
78
+ ## Limitations
79
+
80
+ - This is an intermediate checkpoint and may not represent the final trained model
81
+ - Performance may vary depending on the specific training step
82
+ - Always evaluate the model on your specific use case
83
 
84
  ## Citation
85
 
86
  ```bibtex
87
+ @misc{tonyzhao123_dummy_llama4_checkpoint_400,
88
+ title={tonyzhao123/dummy_llama4 - Checkpoint 400},
89
  author={Your Name},
90
  year={2024},
91
  publisher={Hugging Face},
config.json CHANGED
@@ -77,7 +77,7 @@
77
  "rope_theta": 500000.0,
78
  "router_aux_loss_coef": 0.001,
79
  "router_jitter_noise": 0.0,
80
- "torch_dtype": "bfloat16",
81
  "use_cache": true,
82
  "use_qk_norm": true,
83
  "vocab_size": 202048
@@ -104,6 +104,7 @@
104
  "projector_input_dim": 768,
105
  "projector_output_dim": 768,
106
  "rope_theta": 10000,
 
107
  "vision_feature_layer": -1,
108
  "vision_feature_select_strategy": "default",
109
  "vision_output_dim": 768
 
77
  "rope_theta": 500000.0,
78
  "router_aux_loss_coef": 0.001,
79
  "router_jitter_noise": 0.0,
80
+ "torch_dtype": "float32",
81
  "use_cache": true,
82
  "use_qk_norm": true,
83
  "vocab_size": 202048
 
104
  "projector_input_dim": 768,
105
  "projector_output_dim": 768,
106
  "rope_theta": 10000,
107
+ "torch_dtype": "float32",
108
  "vision_feature_layer": -1,
109
  "vision_feature_select_strategy": "default",
110
  "vision_output_dim": 768
tokenizer.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:172c9eb4beafc72601690da3ccfcede5c2e6806a8d5ec1fca33e22acea8023a4
3
- size 27948578
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b6cdf15c6af56b42f0ebed8200dbcae60691ea7d58b8b4029ddb4f45599043df
3
+ size 27948867