Qwen3-4B-Instruct-2507-GGUF
Qwen3-4B-Instruct-2507 is a 4-billion-parameter causal language model designed for advanced instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage, with significant improvements in these general capabilities and substantial gains in long-tail knowledge coverage across multiple languages. It operates in a non-thinking mode without generating explicit reasoning step tags, features 36 layers, 32 query and 8 key-value attention heads using GQA, and supports an extremely long native context length of 262,144 tokens.
The model is pretrained and post-trained for enhanced performance and alignment with user preferences, excelling in subjective and open-ended tasks with more helpful and higher-quality responses. It can be deployed efficiently via popular toolkits such as Hugging Face Transformers, SGLang, and vLLM, and supports extensive long-context understanding suitable for complex tasks. Code examples, benchmark evaluations, deployment instructions, and agentic tool-calling capabilities with Qwen-Agent are fully documented in its official repository on Hugging Face.
Model Files
File Name | Size | Quant Type |
---|---|---|
Qwen3-4B-Thinking-2507.BF16.gguf | 8.05 GB | BF16 |
Qwen3-4B-Thinking-2507.F16.gguf | 8.05 GB | F16 |
Qwen3-4B-Thinking-2507.F32.gguf | 16.1 GB | F32 |
Qwen3-4B-Thinking-2507.Q2_K.gguf | 1.67 GB | Q2_K |
Qwen3-4B-Thinking-2507.Q3_K_L.gguf | 2.24 GB | Q3_K_L |
Qwen3-4B-Thinking-2507.Q3_K_M.gguf | 2.08 GB | Q3_K_M |
Qwen3-4B-Thinking-2507.Q3_K_S.gguf | 1.89 GB | Q3_K_S |
Qwen3-4B-Thinking-2507.Q4_K_M.gguf | 2.5 GB | Q4_K_M |
Qwen3-4B-Thinking-2507.Q4_K_S.gguf | 2.38 GB | Q4_K_S |
Qwen3-4B-Thinking-2507.Q5_K_M.gguf | 2.89 GB | Q5_K_M |
Qwen3-4B-Thinking-2507.Q5_K_S.gguf | 2.82 GB | Q5_K_S |
Qwen3-4B-Thinking-2507.Q6_K.gguf | 3.31 GB | Q6_K |
Qwen3-4B-Thinking-2507.Q8_0.gguf | 4.28 GB | Q8_0 |
Quants Usage
(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)
Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):
- Downloads last month
- 2,575
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
32-bit
Model tree for prithivMLmods/Qwen3-4B-Instruct-2507-GGUF
Base model
Qwen/Qwen3-4B-Instruct-2507