Q2 performace result on dual RTX5090 with 20 layers offloaded to GPU:

model size params backend ngl test t/s
qwen3moe ?B Q2_K - Medium 162.66 GiB 480.15 B CUDA 20 pp512 90.09 ± 1.14
qwen3moe ?B Q2_K - Medium 162.66 GiB 480.15 B CUDA 20 tg128 12.51 ± 0.11

'Make knowledge free for everyone'

Quantized version of: Qwen/Qwen3-Coder-480B-A35B-Instruct Buy Me a Coffee at ko-fi.com

Downloads last month
1,162
GGUF
Model size
480B params
Architecture
qwen3moe
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DevQuasar/Qwen.Qwen3-Coder-480B-A35B-Instruct-GGUF

Quantized
(25)
this model

Collection including DevQuasar/Qwen.Qwen3-Coder-480B-A35B-Instruct-GGUF