loggenix-nanoKimi2-test
This model was trained using the following configuration:
Training Details
- Base Architecture: kimi_k2
- Optimizer: muon
- Learning Rate: 0.02
- Weight Decay: 0.1
- Dataset: loggenix-rca
- Hidden Size: 1024
- Epochs: 1
Model Architecture
This is a Mixture of Experts (MoE) model based on DeepseekV3 architecture.
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("kshitijthakkar/loggenix-nanoKimi2-test")
model = AutoModelForCausalLM.from_pretrained("kshitijthakkar/loggenix-nanoKimi2-test")
# Generate text
input_text = "Hello, how are you?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=50, do_sample=True, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Training Script
This model was trained using a custom training script with the Muon optimizer (if specified).
- Downloads last month
- 38
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support