Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
lingang's picture
1

lingang

seth-zou
·

AI & ML interests

None yet

Recent Activity

new activity 5 days ago
deepseek-ai/DeepSeek-V3.1-Base:Modelcard come On ~~
published a dataset 2 months ago
Alibaba-Cloud/ds-seth-01
reacted to clefourrier's post with 🤯 over 1 year ago
Fun fact about evaluation, part 2! How much do scores change depending on prompt format choice? Using different prompts (all present in the literature, from `Prompt question?` to `Question: prompt question?\nChoices: enumeration of all choices\nAnswer: `), we get a score range of... 10 points for a single model! Keep in mind that we only changed the prompt, not the evaluation subsets, etc. Again, this confirms that evaluation results reported without their details are basically bullshit. Prompt format on the x axis, all these evals look at the logprob of either "choice A/choice B..." or "A/B...". Incidentally, it also changes model rankings - so a "best" model might only be best on one type of prompt...
View all activity

Organizations

Alibaba Cloud's profile picture

seth-zou 's models 1

seth-zou/SethModel01

Unconditional Image Generation • Updated Feb 11, 2024
Company
TOS Privacy About Jobs
Website
Models Datasets OCR模型免费转Markdown Pricing 模型下载攻略