Using vllm 0.10.0, `reasoning content` appears in content field instead of reasoning_content with Qwen3-30B-A3B-Thinking model

#2
by aweffr - opened

English:

I'm encountering an issue with the Qwen3-30B-A3B-Thinking model when using vLLM v0.10.0. The model's reasoning process is being returned in the content field instead of the expected reasoning_content field.

Setup:

  • vLLM version: v0.10.0
  • Model: Qwen3-30B-A3B-Thinking-2507
  • Startup command:
vllm serve <MODEL_PATH> --host 0.0.0.0 --port 8000 --dtype bfloat16 --max-model-len 32768 --gpu-memory-utilization 0.9 --served-model-name Qwen3-30B-A3B-Thinking-2507 --enable-lora --max-loras 8 --max-lora-rank 32 --trust-remote-code --reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser hermes --enable-chunked-prefill

Issue:
When making a chat completion request, the model's reasoning content appears in the content field with the following characteristics:

  • Missing opening <think> tag
  • Contains closing </think> tag
  • All reasoning content is mixed with the final response in content
  • The reasoning_content field is null

Expected behavior:
The reasoning process should be separated into the reasoning_content field, with only the final answer in the content field.

Questions:

  1. Is this a vLLM startup configuration issue?
  2. Does the model's chat template need special configuration?
  3. Are there additional parameters needed for proper reasoning content separation?

Any guidance from the community would be greatly appreciated!


中文:

我在使用 vLLM v0.10.0 运行 Qwen3-30B-A3B-Thinking 模型时遇到了一个问题。模型的推理过程被返回在 content 字段中,而不是预期的 reasoning_content 字段中。

环境配置:

  • vLLM 版本: v0.10.0
  • 模型: Qwen3-30B-A3B-Thinking-2507
  • 启动命令包含 --reasoning-parser qwen3 参数

问题现象:
在进行聊天补全请求时,模型的推理内容出现在 content 字段中,具有以下特征:

  • 缺少开始的 <think> 标签
  • 包含结束的 </think> 标签
  • 所有推理内容与最终回复混合在 content
  • reasoning_content 字段为 null

预期行为:
推理过程应该被分离到 reasoning_content 字段中,content 字段只包含最终答案。

疑问:

  1. 这是 vLLM 启动配置的问题吗?
  2. 模型的聊天模板是否需要特殊配置?
  3. 是否需要额外的参数来正确分离推理内容?

非常感谢社区的任何指导!

Sample Response (Problematic)

{
  "id": "chatcmpl-8f283e39ea9d450593e7b22a2618fbd9",
  "object": "chat.completion",
  "created": 1754014569,
  "model": "Qwen3-30B-A3B-Thinking-2507",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "嗯,用户发来"你好",看起来是个很简单的问候。首先,我需要确认用户的需求。可能他们刚开始和我对话,想测试一下我的反应。也可能是想开始一个新的话题,但还没想好具体要问什么。\n\n接下来,我要考虑如何回应。作为AI助手,应该友好、热情,同时引导用户说出他们的需求。避免太长的回复,保持简洁。可能需要用中文,因为用户的问候是中文的。\n\n然后,检查有没有其他潜在需求。比如,用户可能之前有其他问题,但这里只是开始。或者他们可能想确认我是否在线。所以,回复要包含欢迎语,并主动询问需要什么帮助。\n\n还要注意语气,不要太正式,用"你好!"开头,加个表情符号可能更亲切,比如😊。然后列出几个常见帮助方向,比如解答问题、写故事、编程等,这样用户知道可以怎么用我。\n\n另外,避免使用复杂词汇,确保易懂。比如"有什么问题或需要帮忙的吗?"这样比较自然。可能还要检查有没有错别字,确保回复准确。\n\n最后,确保符合之前的指示,比如不涉及敏感内容,保持中立。现在组织语言:先问候,再表达乐意帮忙,然后举例说明能提供的帮助,最后邀请用户提问。这样结构清晰,用户容易回应。\n</think>\n\n你好!😊 很高兴见到你~有什么问题或需要帮忙的吗?比如解答疑问、写故事、编程,或者随便聊聊都可以哦!等你来告诉我~",
        "refusal": null,
        "annotations": null,
        "audio": null,
        "function_call": null,
        "tool_calls": [],
        "reasoning_content": null
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "prompt_tokens": 11,
    "total_tokens": 326,
    "completion_tokens": 315,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null,
  "kv_transfer_params": null
}

Expected Response Structure

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "你好!😊 很高兴见到你~有什么问题或需要帮忙的吗?比如解答疑问、写故事、编程,或者随便聊聊都可以哦!等你来告诉我~",
        "reasoning_content": "嗯,用户发来"你好",看起来是个很简单的问候。首先,我需要确认用户的需求...[reasoning process]"
      }
    }
  ]
}

Read the description again from this model.

Additionally, to enforce model thinking, the default chat template automatically includes . Therefore, it is normal for the model's output to contain only without an explicit opening tag.

LMStudio updated their code to support this. No idea about vllm.

https://lmstudio.ai/beta-releases#release-notes---lm-studio-0321-build-2-beta

You can use vllm serve <MODEL_PATH> <other_parameters> --chat-template ./chat_template.jinja to prompt the model to generate <think> tags on its own. This works for Open WebUI. I just tested it and found it also works for chat completion requests :)

Save the following code to ./chat_template.jinja:

{%- if tools %}
    {{- '<|im_start|>system\n' }}
    {%- if messages[0].role == 'system' %}
        {{- messages[0].content + '\n\n' }}
    {%- endif %}
    {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
    {%- for tool in tools %}
        {{- "\n" }}
        {{- tool | tojson }}
    {%- endfor %}
    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
{%- else %}
    {%- if messages[0].role == 'system' %}
        {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
    {%- endif %}
{%- endif %}
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
    {%- set index = (messages|length - 1) - loop.index0 %}
    {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
        {%- set ns.multi_step_tool = false %}
        {%- set ns.last_query_index = index %}
    {%- endif %}
{%- endfor %}
{%- for message in messages %}
    {%- if message.content is string %}
        {%- set content = message.content %}
    {%- else %}
        {%- set content = '' %}
    {%- endif %}
    {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
    {%- elif message.role == "assistant" %}
        {%- set reasoning_content = '' %}
        {%- if message.reasoning_content is string %}
            {%- set reasoning_content = message.reasoning_content %}
        {%- else %}
            {%- if '</think>' in content %}
                {%- set reasoning_content = ((content.split('</think>')|first).rstrip('\n').split('<think>')|last).lstrip('\n') %}
                {%- set content = (content.split('</think>')|last).lstrip('\n') %}
            {%- endif %}
        {%- endif %}
        {%- if loop.index0 > ns.last_query_index %}
            {{- '<|im_start|>' + message.role + '\n' + content }}
        {%- else %}
            {{- '<|im_start|>' + message.role + '\n' + content }}
        {%- endif %}
        {%- if message.tool_calls %}
            {%- for tool_call in message.tool_calls %}
                {%- if (loop.first and content) or (not loop.first) %}
                    {{- '\n' }}
                {%- endif %}
                {%- if tool_call.function %}
                    {%- set tool_call = tool_call.function %}
                {%- endif %}
                {{- '<tool_call>\n{"name": "' }}
                {{- tool_call.name }}
                {{- '", "arguments": ' }}
                {%- if tool_call.arguments is string %}
                    {{- tool_call.arguments }}
                {%- else %}
                    {{- tool_call.arguments | tojson }}
                {%- endif %}
                {{- '}\n</tool_call>' }}
            {%- endfor %}
        {%- endif %}
        {{- '<|im_end|>\n' }}
    {%- elif message.role == "tool" %}
        {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
            {{- '<|im_start|>user' }}
        {%- endif %}
        {{- '\n<tool_response>\n' }}
        {{- content }}
        {{- '\n</tool_response>' }}
        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
            {{- '<|im_end|>\n' }}
        {%- endif %}
    {%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
    {{- '<|im_start|>assistant\n' }}
{%- endif %}

Sign up or log in to comment