You made an error with SimpleQA scores

#50
by ID0M - opened

EDIT: My mistake. Didn't realize this was with internet search.. Would love for them to quote the score with no tools/search.

Perhaps you meant to write something like 63.4 instead of 93.4%?

This is ridiculous and can't be right. Unless you tested it in some different unusual way, but they we can't compare this score against other models can we...

It's not an error. Because that SimpleQA is under Search Agent but not General.

I think it is a base model, not an instruction-tuned model. They have not yet uploaded the 3.1 model.

I think it is a base model, not an instruction-tuned model. They have not yet uploaded the 3.1 model.

https://huggingface.co/deepseek-ai/DeepSeek-V3.1

Perhaps you meant to write something like 63.4 instead of 93.4%?

This is ridiculous and can't be right. Unless you tested it in some different unusual way, but they we can't compare this score against other models can we...

It's not an error. Because that SimpleQA is under Search Agent but not General.

Yeah you are right, my bad. I realized this later and forgot about this discussion thread lol.

It would be nice for them to include the scores with no tools for this benchmark. I personally don't see much point testing world knowledge with internet access and making this benchmark saturated (which is otherwise a great metric)...

Sign up or log in to comment