Qwen3-4B

Model Description

Qwen3-4B is a 4-billion-parameter general-purpose language model from the Qwen team at Alibaba Cloud.
Part of the Qwen3 series, it balances strong language understanding, reasoning, and generation performance with efficient deployment at smaller scale.

Trained on a large, high-quality multilingual dataset, Qwen3-4B supports a broad range of NLP tasks and can be fine-tuned for specialized domains.

Features

Conversational AI: context-aware dialogue for chatbots and assistants.
Content generation: articles, marketing copy, code comments, and more.
Reasoning & analysis: structured problem-solving and explanations.
Multilingual: understands and generates multiple languages.
Customizable: adaptable through fine-tuning for domain-specific tasks.

Use Cases

Virtual assistants and customer support
Multilingual content creation
Document summarization and analysis
Education and tutoring applications
Domain-specific fine-tuned models (finance, healthcare, etc.)

Inputs and Outputs

Input:

Text prompts or conversation history (tokenized sequences for APIs).

Output:

Generated text (answers, explanations, creative content).
Optionally, raw logits/probabilities for advanced tasks.

How to use

⚠️ Hardware requirement: the model currently runs only on Qualcomm NPUs (e.g., Snapdragon-powered AIPC).
Apple NPU support is planned next.

1) Install Nexa-SDK

Download and follow the steps under "Deploy Section" Nexa's model page: Download Windows arm64 SDK
(Other platforms coming soon)

2) Get an access token

Create a token in the Model Hub, then log in:

nexa config set license '<access_token>'

3) Run the model

Running:

nexa infer NexaAI/qwen3-4B-npu

License

Licensed under: Qwen3-4B LICENSE

References

Model card: https://huggingface.co/Qwen/Qwen3-4B

NexaAI
/

qwen3-4B-npu