HectorHe/Deepseek-V2-13B-Math7K-Expert-Enhance-Subset-Expert-MoE-32-experts Text Generation • 16B • Updated 7 days ago • 13
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-sft-math7k Text Generation • 16B • Updated 8 days ago • 36 • 2
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-sft-math14k Text Generation • 16B • Updated 8 days ago • 24 • 1
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-sft-s1K Text Generation • 16B • Updated 8 days ago • 27 • 1
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-sft-nemotron-code Text Generation • 0.0B • Updated 8 days ago • 66 • 1
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-Math10K-diff-info-Distill-mixture-new 16B • Updated Jul 23 • 8
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-Math10K-diff-info-Distill-forward-kl-new 16B • Updated Jul 22 • 7
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-Math10K-Distill-6-experts-test-may 3B • Updated Jul 14 • 8
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-Math10K-diff-info-Distill-mixture 16B • Updated Jul 10 • 8
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-Math10K-diff-info-Distill-forward-kl 16B • Updated Jul 10 • 10
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-Math10K-diff-info-Distill-token-specific 16B • Updated Jul 10 • 9
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-Math10K-diff-info-Distill-token-specific-scale 16B • Updated Jul 10 • 9
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-Math10K-Distill-6-experts-token-specific 3B • Updated Jul 1 • 8
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-Math10K-Distill-6-experts-token-specific-3-scaled 3B • Updated Jul 1 • 9