Qwen3-4B-Instruct-2507

Model Description

Qwen3-4B-Instruct-2507 is an updated non-thinking variant in the Qwen3 family, designed for instruction-following tasks without generating <think></think> reasoning blocks.
Trained for enhanced general capabilities—including logic, coding, math, science, and long-tail multilingual knowledge—while natively supporting sprawling 256K-token contexts.

Features

Instruction-tuned performance: Strong at prompts, logic, comprehension, coding.
Multilingual strength: Expanded long-tail coverage across many languages.
Massive context window: Handles up to 262,144 tokens natively.
Clean output: No thinking-mode parsing needed—just straight responses.

Use Cases

High-quality conversational agents and instruction following
Processing long documents, books, legal texts, and source code
Multilingual tasks or low-resource language scenarios

Inputs and Outputs

Input: Text prompts—questions, commands, code tasks—without any special thinking mode flags.
Output: Direct, context-aware responses—answers, explanations, code—with no internal thought annotations.

How to use

⚠️ Hardware requirement: the model currently runs only on Qualcomm NPUs (e.g., Snapdragon-powered AIPC).
Apple NPU support is planned next.

1) Install Nexa-SDK

Download and follow the steps under "Deploy Section" Nexa's model page: Download Windows arm64 SDK
(Other platforms coming soon)

2) Get an access token

Create a token in the Model Hub, then log in:

nexa config set license '<access_token>'

3) Run the model

Running:

nexa infer NexaAI/Qwen3-4B-Instruct-2507-npu

License

Licensed under Apache-2.0

References

Model card: https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507

NexaAI
/

Qwen3-4B-Instruct-2507-npu