cyente commited on
Commit
f9ce528
·
verified ·
1 Parent(s): 631e62b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -13
README.md CHANGED
@@ -14,7 +14,7 @@ pipeline_tag: text-generation
14
 
15
  Today, we're announcing **Qwen3-Coder**, our most agentic code model to date. **Qwen3-Coder** is available in multiple sizes, but we're excited to introduce its most powerful variant first: **Qwen3-Coder-480B-A35B-Instruct**. featuring the following key enhancements:
16
 
17
- - **Significant Performance** among open models on **Agentic Coding**, **Agentic Browser-Use**, and other foundational coding tasks, achieving results comparable to Claude Sonnet 4.
18
  - **Long-context Capabilities** with native support for **256K** tokens, extendable up to **1M** tokens using Yarn, optimized for repository-scale understanding.
19
  - **Agentic Coding** supporting for most platfrom such as **Qwen Code**, **CLINE**, featuring a specially designed function call format.
20
 
@@ -84,21 +84,9 @@ content = tokenizer.decode(output_ids, skip_special_tokens=True)
84
  print("content:", content)
85
  ```
86
 
87
- For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.5` or to create an OpenAI-compatible API endpoint:
88
- - SGLang:
89
- ```shell
90
- python -m sglang.launch_server --model-path Qwen/Qwen3-480B-A35B-Instruct-FP8 --tp8 --enable-ep-moe --context-length 262144
91
- ```
92
- - vLLM:
93
- ```shell
94
- vllm serve Qwen/Qwen3-480B-A35B-Instruct-FP8 --tensor-parallel-size 8 --enenable-expert-parallel --max-model-len 262144
95
- ```
96
-
97
  **Note: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as `32,768`.**
98
 
99
 
100
- For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
101
-
102
  ## Note on FP8
103
 
104
  For convenience and performance, we have provided `fp8`-quantized model checkpoint for Qwen3, whose name ends with `-FP8`. The quantization method is fine-grained `fp8` quantization with block size of 128. You can find more details in the `quantization_config` field in `config.json`.
 
14
 
15
  Today, we're announcing **Qwen3-Coder**, our most agentic code model to date. **Qwen3-Coder** is available in multiple sizes, but we're excited to introduce its most powerful variant first: **Qwen3-Coder-480B-A35B-Instruct**. featuring the following key enhancements:
16
 
17
+ - **Significant Performance** among open models on **Agentic Coding**, **Agentic Browser-Use**, and other foundational coding tasks, achieving results comparable to Claude Sonnet.
18
  - **Long-context Capabilities** with native support for **256K** tokens, extendable up to **1M** tokens using Yarn, optimized for repository-scale understanding.
19
  - **Agentic Coding** supporting for most platfrom such as **Qwen Code**, **CLINE**, featuring a specially designed function call format.
20
 
 
84
  print("content:", content)
85
  ```
86
 
 
 
 
 
 
 
 
 
 
 
87
  **Note: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as `32,768`.**
88
 
89
 
 
 
90
  ## Note on FP8
91
 
92
  For convenience and performance, we have provided `fp8`-quantized model checkpoint for Qwen3, whose name ends with `-FP8`. The quantization method is fine-grained `fp8` quantization with block size of 128. You can find more details in the `quantization_config` field in `config.json`.