Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -8,30 +8,39 @@ tags: | |
| 8 | 
             
            - mlx
         | 
| 9 | 
             
            ---
         | 
| 10 |  | 
| 11 | 
            -
            #  | 
| 12 |  | 
| 13 | 
            -
             | 
| 14 | 
            -
            converted to MLX format from [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
         | 
| 15 | 
            -
            using mlx-lm version **0.24.0**.
         | 
| 16 |  | 
| 17 | 
            -
             | 
|  | |
| 18 |  | 
| 19 | 
             
            ```bash
         | 
| 20 | 
            -
             | 
| 21 | 
             
            ```
         | 
| 22 |  | 
| 23 | 
            -
             | 
| 24 | 
            -
            from mlx_lm import load, generate
         | 
| 25 |  | 
| 26 | 
            -
             | 
| 27 |  | 
| 28 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
| 29 |  | 
| 30 | 
            -
             | 
| 31 | 
            -
                messages = [{"role": "user", "content": prompt}]
         | 
| 32 | 
            -
                prompt = tokenizer.apply_chat_template(
         | 
| 33 | 
            -
                    messages, add_generation_prompt=True
         | 
| 34 | 
            -
                )
         | 
| 35 |  | 
| 36 | 
            -
             | 
| 37 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 8 | 
             
            - mlx
         | 
| 9 | 
             
            ---
         | 
| 10 |  | 
| 11 | 
            +
            # nexaml/Qwen3-4B-4bit-MLX
         | 
| 12 |  | 
| 13 | 
            +
            ## Quickstart
         | 
|  | |
|  | |
| 14 |  | 
| 15 | 
            +
            Run them directly with [nexa-sdk](https://github.com/NexaAI/nexa-sdk) installed
         | 
| 16 | 
            +
            In nexa-sdk CLI:
         | 
| 17 |  | 
| 18 | 
             
            ```bash
         | 
| 19 | 
            +
            nexaml/Qwen3-4B-4bit-MLX
         | 
| 20 | 
             
            ```
         | 
| 21 |  | 
| 22 | 
            +
            ## Overview
         | 
|  | |
| 23 |  | 
| 24 | 
            +
            Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:
         | 
| 25 |  | 
| 26 | 
            +
            - **Uniquely support of seamless switching between thinking mode** (for complex logical reasoning, math, and coding) and **non-thinking mode** (for efficient, general-purpose dialogue) **within single model**, ensuring optimal performance across various scenarios.
         | 
| 27 | 
            +
            - **Significantly enhancement in its reasoning capabilities**, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning.
         | 
| 28 | 
            +
            - **Superior human preference alignment**, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience.
         | 
| 29 | 
            +
            - **Expertise in agent capabilities**, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks.
         | 
| 30 | 
            +
            - **Support of 100+ languages and dialects** with strong capabilities for **multilingual instruction following** and **translation**.
         | 
| 31 |  | 
| 32 | 
            +
            #### Model Overview
         | 
|  | |
|  | |
|  | |
|  | |
| 33 |  | 
| 34 | 
            +
            **Qwen3-4B** has the following features:
         | 
| 35 | 
            +
            - Type: Causal Language Models
         | 
| 36 | 
            +
            - Training Stage: Pretraining & Post-training
         | 
| 37 | 
            +
            - Number of Parameters: 4.0B
         | 
| 38 | 
            +
            - Number of Paramaters (Non-Embedding): 3.6B
         | 
| 39 | 
            +
            - Number of Layers: 36
         | 
| 40 | 
            +
            - Number of Attention Heads (GQA): 32 for Q and 8 for KV
         | 
| 41 | 
            +
            - Context Length: 32,768 natively and [131,072 tokens with YaRN](#processing-long-texts). 
         | 
| 42 | 
            +
             | 
| 43 | 
            +
            For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
         | 
| 44 | 
            +
             | 
| 45 | 
            +
            ## Reference
         | 
| 46 | 
            +
            **Original model card**: [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)
         | 

