V3.1 seems to be pretty bad at everything except coding and mathematics. V3.1 看起来除了编程和数学之外,其他方面都很差。

#26
by qazqazqazqaz46 - opened

After testing various prompts, providers, and the official API, the results indicate that this model is essentially designed for coding and is unsuitable for daily use. It also performs poorly at following instructions or prompts given by users.

在测试了各种提示词、服务商以及官方 API 后,结果表明该模型本质上是为编程设计的,不适合日常使用。同时,它在遵循用户指令或提示方面的表现也很糟。

I suppose V3.1 was originally designed as a generalist model with a hybrid mode to reduce costs, combining the benefits of a chat model for daily use and a thinking model for precise tasks like coding, math, and agent functions. Unfortunately, it turned out to be an excellent coder+agent but performed poorly in every other areas, regardless of the mode chosen.

我想 V3.1 最初的设计是作为一个通用模型,通过混合模式来降低成本,结合聊天模型在日常使用中的优势,以及思考模型在编码, 数学和代理功能等精确任务中的优势。不幸的是,它最后只在数学, 编码和代理方面表现出色,而在其他所有领域表现都很差,无论选择哪种模式。

qazqazqazqaz46 changed discussion title from V3.1 seems to be pretty bad at everything except coding and mathematics. to V3.1 seems to be pretty bad at everything except coding and mathematics. V3.1 看起来除了编程和数学之外,其他方面都很差。

我也感觉是这样,尤其是Agent自动代理时的系统提示词告诉它怎么做,但它往往不遵守

The hybrid models ended up failing in the end... Just treat this V3.1 as a pure coder.

混合模型终究是失败了... 这个V3.1当作纯Coder就行

Sign up or log in to comment