Qualcomm NPU
Collection
Latest SOTA models supported on Qualcomm NPU.
•
8 items
•
Updated
Qwen3-4B-Instruct-2507 is an updated non-thinking variant in the Qwen3 family, designed for instruction-following tasks without generating <think></think>
reasoning blocks.
Trained for enhanced general capabilities—including logic, coding, math, science, and long-tail multilingual knowledge—while natively supporting sprawling 256K-token contexts.
Input: Text prompts—questions, commands, code tasks—without any special thinking mode flags.
Output: Direct, context-aware responses—answers, explanations, code—with no internal thought annotations.
⚠️ Hardware requirement: the model currently runs only on Qualcomm NPUs (e.g., Snapdragon-powered AIPC).
Apple NPU support is planned next.
Create a token in the Model Hub, then log in:
nexa config set license '<access_token>'
Running:
nexa infer NexaAI/Qwen3-4B-Instruct-2507-npu