loggenix-nanoKimi2-test

This model was trained using the following configuration:

Training Details

  • Base Architecture: kimi_k2
  • Optimizer: muon
  • Learning Rate: 0.02
  • Weight Decay: 0.1
  • Dataset: loggenix-rca
  • Hidden Size: 1024
  • Epochs: 1

Model Architecture

This is a Mixture of Experts (MoE) model based on DeepseekV3 architecture.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("kshitijthakkar/loggenix-nanoKimi2-test")
model = AutoModelForCausalLM.from_pretrained("kshitijthakkar/loggenix-nanoKimi2-test")

# Generate text
input_text = "Hello, how are you?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=50, do_sample=True, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Script

This model was trained using a custom training script with the Muon optimizer (if specified).

Downloads last month
38
Safetensors
Model size
208M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support