Qwen3-4B-Instruct-2507-GGUF

Qwen3-4B-Instruct-2507 is a 4-billion-parameter causal language model designed for advanced instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage, with significant improvements in these general capabilities and substantial gains in long-tail knowledge coverage across multiple languages. It operates in a non-thinking mode without generating explicit reasoning step tags, features 36 layers, 32 query and 8 key-value attention heads using GQA, and supports an extremely long native context length of 262,144 tokens.

The model is pretrained and post-trained for enhanced performance and alignment with user preferences, excelling in subjective and open-ended tasks with more helpful and higher-quality responses. It can be deployed efficiently via popular toolkits such as Hugging Face Transformers, SGLang, and vLLM, and supports extensive long-context understanding suitable for complex tasks. Code examples, benchmark evaluations, deployment instructions, and agentic tool-calling capabilities with Qwen-Agent are fully documented in its official repository on Hugging Face.

Model Files

File Name Size Quant Type
Qwen3-4B-Thinking-2507.BF16.gguf 8.05 GB BF16
Qwen3-4B-Thinking-2507.F16.gguf 8.05 GB F16
Qwen3-4B-Thinking-2507.F32.gguf 16.1 GB F32
Qwen3-4B-Thinking-2507.Q2_K.gguf 1.67 GB Q2_K
Qwen3-4B-Thinking-2507.Q3_K_L.gguf 2.24 GB Q3_K_L
Qwen3-4B-Thinking-2507.Q3_K_M.gguf 2.08 GB Q3_K_M
Qwen3-4B-Thinking-2507.Q3_K_S.gguf 1.89 GB Q3_K_S
Qwen3-4B-Thinking-2507.Q4_K_M.gguf 2.5 GB Q4_K_M
Qwen3-4B-Thinking-2507.Q4_K_S.gguf 2.38 GB Q4_K_S
Qwen3-4B-Thinking-2507.Q5_K_M.gguf 2.89 GB Q5_K_M
Qwen3-4B-Thinking-2507.Q5_K_S.gguf 2.82 GB Q5_K_S
Qwen3-4B-Thinking-2507.Q6_K.gguf 3.31 GB Q6_K
Qwen3-4B-Thinking-2507.Q8_0.gguf 4.28 GB Q8_0

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

image.png

Downloads last month
2,575
GGUF
Model size
4.02B params
Architecture
qwen3
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prithivMLmods/Qwen3-4B-Instruct-2507-GGUF

Quantized
(51)
this model

Collection including prithivMLmods/Qwen3-4B-Instruct-2507-GGUF