Having issues with repetition.

#2
by rdsm - opened

Any one else having issues with the model repeating it self?
After some time deployed the model started repeating itself "The !!!!!!!!!!!!!!!!!(... continues indefinitely...)"

$ curl -s -X POST "http://[my internal url]/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nvidia/Kimi-K2.5-NVFP4",
    "messages": [{"role": "user", "content": "Hello! What can you do?"}],
    "max_tokens": 100,
    "temperature": 0.7
  }' | python -m json.tool
{
    "id": "chatcmpl-960b0d2b9bd89f72",
    "object": "chat.completion",
    "created": 1771856932,
    "model": "nvidia/Kimi-K2.5-NVFP4",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": " !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!",
                "refusal": null,
                "annotations": null,
                "audio": null,
                "function_call": null,
                "tool_calls": [],
                "reasoning": " !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
            },
            "logprobs": null,
            "finish_reason": "length",
            "stop_reason": null,
            "token_ids": null
        }
    ],
    "service_tier": null,
    "system_fingerprint": null,
    "usage": {
        "prompt_tokens": 33,
        "total_tokens": 133,
        "completion_tokens": 100,
        "prompt_tokens_details": null
    },
    "prompt_logprobs": null,
    "prompt_token_ids": null,
    "kv_transfer_params": null
}

Hardware: B200s

The same issue~~~
Any update?

No luck, reverted the deploy back to the moonshot version.

Hi @rdsm , can you describe me your deployment set up?

@Xinxinli I am using 8xB300s, I got the vllm-openai image from cu130-nightly-7b6e5289bce66d33e338cdba5ea3e0db174d1f53 and applied the fixes from https://github.com/vllm-project/vllm/pull/33764#issuecomment-3916675391 this repo. Apparently a fix was merged into mainline on vllm.

How are you running vLLM? Works fine for me using v0.17.1

How are you running vLLM? Works fine for me using v0.17.1

@g-a-b-y , I am using 2 configurations 4x B300s and 8x B300s, last tested on v0.18.0-cu130 , the model is initially fine, but after some load it eventually starts the repetition pattern. I have run heavy benchmarks and noticed no issues, then after it released to the public it starts again. seems to me to be related to some specific type of request that triggers the issue...

Initially I noticed the issue only on the NVFP4 variant, but now I see it also on the regular INT4 Moonshotai one too when using most recent vllm versions.

https://github.com/vllm-project/vllm/issues/36763 here are a few theories.

@g-a-b-y , would you mind sharing more information about your deployment? maybe the startup flags and parameters that you are passing and the kind of load that it is being exposed?

My last attempt was on the regular moonshotai model, I tried this (enable_flashinfer_autotune was a suggestion from Wei Zhao:

/usr/bin/python3 /usr/local/bin/vllm serve moonshotai/Kimi-K2.5 --tensor-parallel-size 4 --mm-encoder-tp-mode data --trust-remote-code --tool-call-parser kimi_k2 --reasoning-parser kimi_k2 --gpu-memory-utilization 0.95 --enable-auto-tool-choice --kernel-config {"enable_flashinfer_autotune": false}

@rdsm I'm using 0.17.1 with CUDA 12.8

Same command as you, but my gpu memory is 0.93 and I don't have the kernel config setting enable_flashinfer_autotune.

@rdsm We experienced the same exact issue on 4xB300 and only found out after some extended load

@g-a-b-y are you running on B300s?

@rdsm I'm not. I'm using a custom tool parser though. Could it be that?

It is the one from here: https://github.com/vllm-project/vllm/issues/37184#issuecomment-4073230433

Issue was found and fixed by the vllm team, vllm on v0.18.1 has the fix.
for v0.18.0 --attention-config.use_trtllm_ragged_deepseek_prefill=True fix the problem.
more details at: https://github.com/vllm-project/vllm/pull/38562

Sign up or log in to comment

Free AI Image Generator No sign-up. Instant results. Open Now