NexaAI
/

Qwen3-4B-4bit-MLX

Text Generation

4-bit precision

Model card Files Files and versions

nexaml commited on Jul 16

Commit

62425da

·

verified ·

1 Parent(s): 8990654

Update README.md

Files changed (1) hide show

README.md +26 -17

README.md CHANGED Viewed

@@ -8,30 +8,39 @@ tags:
 - mlx
 ---
-# mlx-community/Qwen3-4B-4bit
-This model [mlx-community/Qwen3-4B-4bit](https://huggingface.co/mlx-community/Qwen3-4B-4bit) was
-converted to MLX format from [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
-using mlx-lm version **0.24.0**.
-## Use with mlx
 ```bash
-pip install mlx-lm
 ```
-```python
-from mlx_lm import load, generate
-model, tokenizer = load("mlx-community/Qwen3-4B-4bit")
-prompt = "hello"
-if tokenizer.chat_template is not None:
-    messages = [{"role": "user", "content": prompt}]
-    prompt = tokenizer.apply_chat_template(
-        messages, add_generation_prompt=True
-    )
-response = generate(model, tokenizer, prompt=prompt, verbose=True)
-```

 - mlx
 ---
+# nexaml/Qwen3-4B-4bit-MLX
+## Quickstart
+Run them directly with [nexa-sdk](https://github.com/NexaAI/nexa-sdk) installed
+In nexa-sdk CLI:
 ```bash
+nexaml/Qwen3-4B-4bit-MLX
 ```
+## Overview
+Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:
+- **Uniquely support of seamless switching between thinking mode** (for complex logical reasoning, math, and coding) and **non-thinking mode** (for efficient, general-purpose dialogue) **within single model**, ensuring optimal performance across various scenarios.
+- **Significantly enhancement in its reasoning capabilities**, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning.
+- **Superior human preference alignment**, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience.
+- **Expertise in agent capabilities**, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks.
+- **Support of 100+ languages and dialects** with strong capabilities for **multilingual instruction following** and **translation**.
+#### Model Overview
+**Qwen3-4B** has the following features:
+- Type: Causal Language Models
+- Training Stage: Pretraining & Post-training
+- Number of Parameters: 4.0B
+- Number of Paramaters (Non-Embedding): 3.6B
+- Number of Layers: 36
+- Number of Attention Heads (GQA): 32 for Q and 8 for KV
+- Context Length: 32,768 natively and [131,072 tokens with YaRN](#processing-long-texts).
+For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
+## Reference
+**Original model card**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)