Qwen3-4B-Instruct-2507
Model Description
Qwen3-4B-Instruct-2507 is an updated non-thinking variant in the Qwen3 family, designed for instruction-following tasks without generating <think></think>
reasoning blocks.
Trained for enhanced general capabilities—including logic, coding, math, science, and long-tail multilingual knowledge—while natively supporting sprawling 256K-token contexts.
Features
- Instruction-tuned performance: Strong at prompts, logic, comprehension, coding.
- Multilingual strength: Expanded long-tail coverage across many languages.
- Massive context window: Handles up to 262,144 tokens natively.
- Clean output: No thinking-mode parsing needed—just straight responses.
Use Cases
- High-quality conversational agents and instruction following
- Processing long documents, books, legal texts, and source code
- Multilingual tasks or low-resource language scenarios
Inputs and Outputs
Input: Text prompts—questions, commands, code tasks—without any special thinking mode flags.
Output: Direct, context-aware responses—answers, explanations, code—with no internal thought annotations.
How to use
⚠️ Hardware requirement: the model currently runs only on Qualcomm NPUs (e.g., Snapdragon-powered AIPC).
Apple NPU support is planned next.
1) Install Nexa-SDK
- Download and follow the steps under "Deploy Section" Nexa's model page: Download Windows arm64 SDK
- (Other platforms coming soon)
2) Get an access token
Create a token in the Model Hub, then log in:
nexa config set license '<access_token>'
3) Run the model
Running:
nexa infer NexaAI/Qwen3-4B-Instruct-2507-npu
License
- Licensed under Apache-2.0