Phil's picture

Phil

phil111

·

AI & ML interests

None yet

Recent Activity

new activity 2 days ago

nvidia/NVIDIA-Nemotron-Nano-9B-v2:This just trades general performance for domain specific gains.

new activity 4 days ago

ByteDance-Seed/Seed-OSS-36B-Base:Please stop blindly trusting and reporting Alibaba's scores.

new activity 4 days ago

google/gemma-3-270m:Weird responses

View all activity

Organizations

None yet

New activity in nvidia/NVIDIA-Nemotron-Nano-9B-v2 2 days ago

This just trades general performance for domain specific gains.

#3 opened 6 days ago by

New activity in ByteDance-Seed/Seed-OSS-36B-Base 4 days ago

Please stop blindly trusting and reporting Alibaba's scores.

#1 opened 4 days ago by

New activity in google/gemma-3-270m 4 days ago

Weird responses

#10 opened 8 days ago by

New activity in google/gemma-3-270m-it 7 days ago

Gemma A3B

#3 opened 10 days ago by

liked a dataset 10 days ago

Codatta/MM-Food-100K

Viewer • Updated 7 days ago • 100k • 655 • 23

New activity in openai/gpt-oss-120b 12 days ago

gpt-oss is actually good. even on less common benchmark

#109 opened 13 days ago by

weijiejailbreak

New activity in openai/gpt-oss-20b 16 days ago

model quality issues

#92 opened 16 days ago by

New activity in Qwen/Qwen3-4B-Instruct-2507 17 days ago

Terrible instruction following

#3 opened 18 days ago by

New activity in Qwen/Qwen3-4B-Instruct-2507 18 days ago

4b model with an 84.2 MMLU-Redux score?

#2 opened 18 days ago by

New activity in openai/gpt-oss-20b 18 days ago

This model is unbelievably ignorant.

#14 opened 19 days ago by

New activity in openai/gpt-oss-120b 19 days ago

Knowledge limitations

#25 opened 19 days ago by

New activity in Qwen/Qwen3-30B-A3B-Instruct-2507 19 days ago

An Improvement, But Q3 30b Still Has Very Little General Knowledge

#2 opened 26 days ago by

Test Scores Can Be Misleading

#8 opened 25 days ago by

New activity in Qwen/Qwen3-235B-A22B-Instruct-2507 23 days ago

More Knowledge, But Hard To Extract

#29 opened 23 days ago by

New activity in zai-org/GLM-4.5 25 days ago

Impressive Broad Knowledge

#12 opened 25 days ago by

liked 2 models 27 days ago

zai-org/GLM-4.5-Air

Text Generation • 110B • Updated 13 days ago • 72.9k • • 386

zai-org/GLM-4.5

Text Generation • 358B • Updated 13 days ago • 56.2k • • 1.25k

New activity in baidu/ERNIE-4.5-300B-A47B-PT 28 days ago

The SimpleQA score of the model is WAY off.

#2 opened about 2 months ago by

New activity in Qwen/Qwen3-30B-A3B 29 days ago

Qwen3 is great, but could be better.

#18 opened 4 months ago by

New activity in Qwen/Qwen3-235B-A22B-Instruct-2507 about 1 month ago

SimpleQA jumped from 12.2 to 54.3?

#4 opened about 1 month ago by