kshitijthakkar commited on
Commit
ccd8b7d
·
verified ·
1 Parent(s): 4fea745

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +47 -0
README.md ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - text-generation
5
+ - kimi_k2
6
+ - muon
7
+ datasets:
8
+ - loggenix-rca
9
+ language:
10
+ - en
11
+ pipeline_tag: text-generation
12
+ ---
13
+
14
+ # loggenix-nanoKimi2-test
15
+
16
+ This model was trained using the following configuration:
17
+
18
+ ## Training Details
19
+ - **Base Architecture**: kimi_k2
20
+ - **Optimizer**: muon
21
+ - **Learning Rate**: 0.02
22
+ - **Weight Decay**: 0.1
23
+ - **Dataset**: loggenix-rca
24
+ - **Hidden Size**: 1024
25
+ - **Epochs**: 1
26
+
27
+ ## Model Architecture
28
+ This is a Mixture of Experts (MoE) model based on DeepseekV3 architecture.
29
+
30
+ ## Usage
31
+
32
+ ```python
33
+ from transformers import AutoTokenizer, AutoModelForCausalLM
34
+
35
+ tokenizer = AutoTokenizer.from_pretrained("kshitijthakkar/loggenix-nanoKimi2-test")
36
+ model = AutoModelForCausalLM.from_pretrained("kshitijthakkar/loggenix-nanoKimi2-test")
37
+
38
+ # Generate text
39
+ input_text = "Hello, how are you?"
40
+ inputs = tokenizer(input_text, return_tensors="pt")
41
+ outputs = model.generate(**inputs, max_length=50, do_sample=True, temperature=0.7)
42
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
43
+ print(response)
44
+ ```
45
+
46
+ ## Training Script
47
+ This model was trained using a custom training script with the Muon optimizer (if specified).