YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Qwen3-4B-Instruct-2507

Model Description

Qwen3-4B-Instruct-2507 is an updated non-thinking variant in the Qwen3 family, designed for instruction-following tasks without generating <think></think> reasoning blocks.
Trained for enhanced general capabilities—including logic, coding, math, science, and long-tail multilingual knowledge—while natively supporting sprawling 256K-token contexts.

Features

  • Instruction-tuned performance: Strong at prompts, logic, comprehension, coding.
  • Multilingual strength: Expanded long-tail coverage across many languages.
  • Massive context window: Handles up to 262,144 tokens natively.
  • Clean output: No thinking-mode parsing needed—just straight responses.

Use Cases

  • High-quality conversational agents and instruction following
  • Processing long documents, books, legal texts, and source code
  • Multilingual tasks or low-resource language scenarios

Inputs and Outputs

Input: Text prompts—questions, commands, code tasks—without any special thinking mode flags.
Output: Direct, context-aware responses—answers, explanations, code—with no internal thought annotations.


How to use

⚠️ Hardware requirement: the model currently runs only on Qualcomm NPUs (e.g., Snapdragon-powered AIPC).
Apple NPU support is planned next.

1) Install Nexa-SDK

  • Download and follow the steps under "Deploy Section" Nexa's model page: Download Windows arm64 SDK
  • (Other platforms coming soon)

2) Get an access token

Create a token in the Model Hub, then log in:

nexa config set license '<access_token>'

3) Run the model

Running:

nexa infer NexaAI/Qwen3-4B-Instruct-2507-npu

License

  • Licensed under Apache-2.0

References

Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including NexaAI/Qwen3-4B-Instruct-2507-npu