Update README.md
Browse files
README.md
CHANGED
@@ -14,7 +14,7 @@ pipeline_tag: text-generation
|
|
14 |
|
15 |
Today, we're announcing **Qwen3-Coder**, our most agentic code model to date. **Qwen3-Coder** is available in multiple sizes, but we're excited to introduce its most powerful variant first: **Qwen3-Coder-480B-A35B-Instruct**. featuring the following key enhancements:
|
16 |
|
17 |
-
- **Significant Performance** among open models on **Agentic Coding**, **Agentic Browser-Use**, and other foundational coding tasks, achieving results comparable to Claude Sonnet
|
18 |
- **Long-context Capabilities** with native support for **256K** tokens, extendable up to **1M** tokens using Yarn, optimized for repository-scale understanding.
|
19 |
- **Agentic Coding** supporting for most platfrom such as **Qwen Code**, **CLINE**, featuring a specially designed function call format.
|
20 |
|
@@ -84,21 +84,9 @@ content = tokenizer.decode(output_ids, skip_special_tokens=True)
|
|
84 |
print("content:", content)
|
85 |
```
|
86 |
|
87 |
-
For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.5` or to create an OpenAI-compatible API endpoint:
|
88 |
-
- SGLang:
|
89 |
-
```shell
|
90 |
-
python -m sglang.launch_server --model-path Qwen/Qwen3-480B-A35B-Instruct-FP8 --tp8 --enable-ep-moe --context-length 262144
|
91 |
-
```
|
92 |
-
- vLLM:
|
93 |
-
```shell
|
94 |
-
vllm serve Qwen/Qwen3-480B-A35B-Instruct-FP8 --tensor-parallel-size 8 --enenable-expert-parallel --max-model-len 262144
|
95 |
-
```
|
96 |
-
|
97 |
**Note: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as `32,768`.**
|
98 |
|
99 |
|
100 |
-
For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
|
101 |
-
|
102 |
## Note on FP8
|
103 |
|
104 |
For convenience and performance, we have provided `fp8`-quantized model checkpoint for Qwen3, whose name ends with `-FP8`. The quantization method is fine-grained `fp8` quantization with block size of 128. You can find more details in the `quantization_config` field in `config.json`.
|
|
|
14 |
|
15 |
Today, we're announcing **Qwen3-Coder**, our most agentic code model to date. **Qwen3-Coder** is available in multiple sizes, but we're excited to introduce its most powerful variant first: **Qwen3-Coder-480B-A35B-Instruct**. featuring the following key enhancements:
|
16 |
|
17 |
+
- **Significant Performance** among open models on **Agentic Coding**, **Agentic Browser-Use**, and other foundational coding tasks, achieving results comparable to Claude Sonnet.
|
18 |
- **Long-context Capabilities** with native support for **256K** tokens, extendable up to **1M** tokens using Yarn, optimized for repository-scale understanding.
|
19 |
- **Agentic Coding** supporting for most platfrom such as **Qwen Code**, **CLINE**, featuring a specially designed function call format.
|
20 |
|
|
|
84 |
print("content:", content)
|
85 |
```
|
86 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
87 |
**Note: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as `32,768`.**
|
88 |
|
89 |
|
|
|
|
|
90 |
## Note on FP8
|
91 |
|
92 |
For convenience and performance, we have provided `fp8`-quantized model checkpoint for Qwen3, whose name ends with `-FP8`. The quantization method is fine-grained `fp8` quantization with block size of 128. You can find more details in the `quantization_config` field in `config.json`.
|