Simeng/math500_qwen2.5_7b_instruct_verification_Qwen2.5-14B-Instruct Viewer • Updated Jul 2 • 500 • 14
Simeng/math500_llama_3.2_3b_instruct_backtracking_Qwen2.5-14B-Instruct Viewer • Updated Jul 2 • 500 • 7
Simeng/math500_llama_3.2_3b_instruct_verification_Qwen2.5-72B-Instruct Viewer • Updated Jul 2 • 500 • 7
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers Paper • 2506.23918 • Published Jun 30 • 86
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks Paper • 2507.01001 • Published Jul 1 • 46
Simeng/math500_llama_3.2_3b_instruct_backtracking_Qwen2.5-14B-Instruct Viewer • Updated Jul 2 • 500 • 7
Simeng/math500_llama_3.2_3b_instruct_verification_Qwen2.5-72B-Instruct Viewer • Updated Jul 2 • 500 • 7
Simeng/math500_qwen2.5_7b_instruct_verification_Qwen2.5-14B-Instruct Viewer • Updated Jul 2 • 500 • 14
Simeng/math500_deepseek_distilled_qwen15b_verification_qwen_25_7b_instruct Viewer • Updated Jul 1 • 500 • 4
Simeng/math500_deepseek_distilled_qwen15b_verification_qwen_25_7b_instruct Viewer • Updated Jul 1 • 500 • 4
Simeng/math500_deepseek_distilled_qwen15b_verification_qwen_25_7b_instruct Viewer • Updated Jul 1 • 500 • 4