Qwen
/

Qwen3-Coder-480B-A35B-Instruct-FP8

Text Generation

Model card Files Files and versions

cyente commited on Jul 22

Commit

f9ce528

·

verified ·

1 Parent(s): 631e62b

Update README.md

Files changed (1) hide show

README.md +1 -13

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ pipeline_tag: text-generation
 Today, we're announcing **Qwen3-Coder**, our most agentic code model to date. **Qwen3-Coder** is available in multiple sizes, but we're excited to introduce its most powerful variant first: **Qwen3-Coder-480B-A35B-Instruct**. featuring the following key enhancements:
-- **Significant Performance** among open models on **Agentic Coding**, **Agentic Browser-Use**, and other foundational coding tasks, achieving results comparable to Claude Sonnet 4.
 - **Long-context Capabilities** with native support for **256K** tokens, extendable up to **1M** tokens using Yarn, optimized for repository-scale understanding.
 - **Agentic Coding** supporting for most platfrom such as **Qwen Code**, **CLINE**, featuring a specially designed function call format.
@@ -84,21 +84,9 @@ content = tokenizer.decode(output_ids, skip_special_tokens=True)
 print("content:", content)
 ```
-For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.5` or to create an OpenAI-compatible API endpoint:
-- SGLang:
-    ```shell
-    python -m sglang.launch_server --model-path Qwen/Qwen3-480B-A35B-Instruct-FP8  --tp8 --enable-ep-moe  --context-length 262144
-    ```
-- vLLM:
-    ```shell
-    vllm serve Qwen/Qwen3-480B-A35B-Instruct-FP8 --tensor-parallel-size 8 --enenable-expert-parallel  --max-model-len 262144
-    ```
 **Note: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as `32,768`.**
-For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
 ## Note on FP8
 For convenience and performance, we have provided `fp8`-quantized model checkpoint for Qwen3, whose name ends with `-FP8`. The quantization method is fine-grained `fp8` quantization with block size of 128. You can find more details in the `quantization_config` field in `config.json`.

 Today, we're announcing **Qwen3-Coder**, our most agentic code model to date. **Qwen3-Coder** is available in multiple sizes, but we're excited to introduce its most powerful variant first: **Qwen3-Coder-480B-A35B-Instruct**. featuring the following key enhancements:
+- **Significant Performance** among open models on **Agentic Coding**, **Agentic Browser-Use**, and other foundational coding tasks, achieving results comparable to Claude Sonnet.
 - **Long-context Capabilities** with native support for **256K** tokens, extendable up to **1M** tokens using Yarn, optimized for repository-scale understanding.
 - **Agentic Coding** supporting for most platfrom such as **Qwen Code**, **CLINE**, featuring a specially designed function call format.
 print("content:", content)
 ```
 **Note: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as `32,768`.**
 ## Note on FP8
 For convenience and performance, we have provided `fp8`-quantized model checkpoint for Qwen3, whose name ends with `-FP8`. The quantization method is fine-grained `fp8` quantization with block size of 128. You can find more details in the `quantization_config` field in `config.json`.