Open LLM Leaderboard

Team

community

https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard

Activity Feed

AI & ML interests

Evaluating open LLMs

Recent Activity

AdinaY authored a paper 8 days ago

Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows

alozowski authored a paper 15 days ago

YourBench: Easy Custom Evaluation Sets for Everyone

AdinaY authored a paper about 2 months ago

RoboChallenge: Large-scale Real-robot Evaluation of Embodied Policies

View all activity

Organization Card

Community About org cards

Open LLM Leaderboard

This is the hub organisation maintaining the Open LLM Leaderboard.

In this space you will find the dataset with detailed results and queries for the models on the leaderboard.

Score results are here, and current state of requests is here. For the detailed prediction, look for your model name in the datasets below!

Track, rank and evaluate open LLMs and chatbots

108

Open LLM Leaderboard Model Comparator

🏆

Compare Open LLM Leaderboard results

125

Open-LLM performances are plateauing, let’s make the leaderboard steep again

🏔

Explore and compare advanced language models on a new leaderboard

Exploring model generations

👀

models 0

None public yet

datasets 4,504

open-llm-leaderboard/contents

Viewer • Updated Mar 20 • 4.58k • 10.4k • 21

open-llm-leaderboard/requests

Preview • Updated Mar 17 • 95.9k • 12

open-llm-leaderboard/rootxhacker__Apollo_v2-32B-details

Viewer • Updated Mar 15 • 43.2k • 88

open-llm-leaderboard/results

Preview • Updated Mar 15 • 21.5k • 16

open-llm-leaderboard/rubenroy__Gilgamesh-72B-details

Viewer • Updated Mar 14 • 43.2k • 98

open-llm-leaderboard/tomasmcm__sky-t1-coder-32b-flash-details

Viewer • Updated Mar 14 • 43.2k • 88

open-llm-leaderboard/Aryanne__QwentileSwap-details

Viewer • Updated Mar 14 • 43.2k • 92

open-llm-leaderboard/sthenno__tempesthenno-sft-0314-stage1-ckpt50-details

Viewer • Updated Mar 14 • 43.2k • 78

open-llm-leaderboard/braindao__DeepSeek-R1-Distill-Qwen-14B-ABUB-ST-details

Viewer • Updated Mar 13 • 43.2k • 86

open-llm-leaderboard/prithivMLmods__Galactic-Qwen-14B-Exp2-details

Viewer • Updated Mar 13 • 43.2k • 115

View 4,504 datasets

Open LLM Leaderboard

AI & ML interests

Recent Activity

Open LLM Leaderboard

Collections 3

open-llm-leaderboard/tensopolis__virtuoso-lite-tensopolis-v2-details

open-llm-leaderboard/tensopolis__falcon3-10b-tensopolis-v1-details

open-llm-leaderboard/Pinkstack__SuperThoughts-CoT-14B-16k-o1-QwQ-details

open-llm-leaderboard/prithivMLmods__QwQ-LCoT-14B-Conversational-details

Open LLM Leaderboard

Open-LLM performances are plateauing, let’s make the leaderboard steep again

open-llm-leaderboard/contents

open-llm-leaderboard/results

open-llm-leaderboard/tensopolis__virtuoso-lite-tensopolis-v2-details

open-llm-leaderboard/tensopolis__falcon3-10b-tensopolis-v1-details

open-llm-leaderboard/Pinkstack__SuperThoughts-CoT-14B-16k-o1-QwQ-details

open-llm-leaderboard/prithivMLmods__QwQ-LCoT-14B-Conversational-details

Open LLM Leaderboard

Open-LLM performances are plateauing, let’s make the leaderboard steep again

open-llm-leaderboard/contents

open-llm-leaderboard/results

spaces 5

Open LLM Leaderboard

Open LLM Leaderboard Model Comparator

Open-LLM performances are plateauing, let’s make the leaderboard steep again

Exploring model generations

models 0

datasets 4,504

open-llm-leaderboard/contents

open-llm-leaderboard/requests

open-llm-leaderboard/rootxhacker__Apollo_v2-32B-details

open-llm-leaderboard/results

open-llm-leaderboard/rubenroy__Gilgamesh-72B-details

open-llm-leaderboard/tomasmcm__sky-t1-coder-32b-flash-details

open-llm-leaderboard/Aryanne__QwentileSwap-details

open-llm-leaderboard/sthenno__tempesthenno-sft-0314-stage1-ckpt50-details

open-llm-leaderboard/braindao__DeepSeek-R1-Distill-Qwen-14B-ABUB-ST-details

open-llm-leaderboard/prithivMLmods__Galactic-Qwen-14B-Exp2-details

AI & ML interests

Recent Activity

Team members 18

Open LLM Leaderboard

Collections 3

Open LLM Leaderboard

Open-LLM performances are plateauing, let’s make the leaderboard steep again

Open LLM Leaderboard

Open-LLM performances are plateauing, let’s make the leaderboard steep again

spaces 5 Sort: Recently updated

Open LLM Leaderboard

Open LLM Leaderboard Model Comparator

Open-LLM performances are plateauing, let’s make the leaderboard steep again

Exploring model generations

models 0

datasets 4,504 Sort: Recently updated

🎉 Free Image Generator Now Available!

spaces 5

datasets 4,504