-
Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum
Paper • 2510.00526 • Published • 10 -
gaotang/figlet_font
Viewer • Updated • 45k • 7 -
gaotang/medical_sft_processed
Viewer • Updated • 23.5k • 14 -
gaotang/numina-cot-subset-67k
Viewer • Updated • 67.6k • 10
Gaotang Li
gaotang
AI & ML interests
None yet
Recent Activity
upvoted a paper 14 days ago
Weak-Driven Learning: How Weak Agents make Strong Agents Stronger upvoted a paper 20 days ago
MARS: Modular Agent with Reflective Search for Automated AI Research upvoted a paper about 1 month ago
Agentic Reasoning for Large Language Models Organizations
None yet
Beyond-Log-Likelihood
-
Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum
Paper • 2510.00526 • Published • 10 -
gaotang/figlet_font
Viewer • Updated • 45k • 7 -
gaotang/medical_sft_processed
Viewer • Updated • 23.5k • 14 -
gaotang/numina-cot-subset-67k
Viewer • Updated • 67.6k • 10
RM-R1
RM-R1: Reward Modeling as Reasoning
models 11
gaotang/deepseek-math-7b-base
Text Generation • 7B • Updated
• 2
gaotang/RM-R1-DeepSeek-Distilled-Qwen-7B
Text Generation • 8B • Updated
• 32 • 2
gaotang/RM-R1-Qwen2.5-Instruct-7B
Text Generation • 8B • Updated
• 182 • 4
gaotang/RM-R1-DeepSeek-Distilled-Qwen-14B
Text Generation • 15B • Updated
• 5 • 1
gaotang/RM-R1-Qwen2.5-Instruct-14B
Text Generation • 15B • Updated
• 15 • 1
gaotang/RM-R1-Qwen2.5-Instruct-32B
Text Generation • 33B • Updated
• 8 • 1
gaotang/RM-R1-DeepSeek-Distilled-Qwen-32B
Text Generation • 33B • Updated
• 82 • • 2
gaotang/qwen_7b_sky_filtered_code8k_math_10k_distilled_Claude_o3_0419
8B • Updated
gaotang/qwen_7b_sky_filtered_code8k_math_10k_distilled_OpenAI
8B • Updated
gaotang/qwen_14b_sky_filtered_code8k_math_10k_distilled_OpenAI
15B • Updated
datasets 35
gaotang/figlet_font
Viewer
• Updated
• 45k • 7
gaotang/figlet_font_train
Viewer
• Updated
• 5 • 5
gaotang/huatuo_medical_sft_processed
Viewer
• Updated
• 19.7k • 7
gaotang/medical_sft_processed
Viewer
• Updated
• 23.5k • 14
gaotang/ParaConflict
Viewer
• Updated
• 2.15k • 15
gaotang/numina-cot-subset-val
Viewer
• Updated
• 128 • 4
gaotang/numina-cot-subset-67k
Viewer
• Updated
• 67.6k • 10
gaotang/ParaConfilct
Viewer
• Updated
• 2.15k • 27
gaotang/RM-R1-Reasoning-RLVR
Viewer
• Updated
• 73k • 38 • 1
gaotang/RM-R1-Entire-RLVR-Train
Viewer
• Updated
• 73k • 29 • 2