Updated model card
Browse files
    	
        README.md
    CHANGED
    
    | @@ -11,25 +11,23 @@ tags: | |
| 11 | 
             
            - smollm
         | 
| 12 | 
             
            ---
         | 
| 13 |  | 
| 14 | 
            -
             | 
| 15 |  | 
| 16 | 
            -
             | 
| 17 | 
            -
            Once ExecuTorch is [set-up](https://pytorch.org/executorch/main/getting-started.html), you can directly download the `*.pte` and tokenizer file and run the model in a mobile app (see [Running in a mobile app](#running-in-a-mobile-app)).
         | 
| 18 |  | 
|  | |
|  | |
| 19 |  | 
| 20 | 
            -
             | 
| 21 |  | 
| 22 | 
            -
             | 
| 23 | 
            -
             | 
| 24 | 
            -
             | 
| 25 | 
            -
             | 
| 26 | 
            -
             | 
| 27 | 
            -
             | 
| 28 | 
            -
            ```Py
         | 
| 29 | 
            -
            python install_dev.py
         | 
| 30 | 
            -
            ```
         | 
| 31 |  | 
| 32 | 
            -
             | 
| 33 | 
             
            ```Shell
         | 
| 34 | 
             
            optimum-cli export executorch \
         | 
| 35 | 
             
              --model HuggingFaceTB/SmolLM3-3B \
         | 
|  | |
| 11 | 
             
            - smollm
         | 
| 12 | 
             
            ---
         | 
| 13 |  | 
| 14 | 
            +
            [HuggingFaceTB/SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B) is quantized using [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao) with 8-bit embeddings and 8-bit dynamic activations with 4-bit weight linears (`8da4w`). It is then lowered to [ExecuTorch](https://github.com/pytorch/executorch) with several optimizations—custom SPDA, custom KV cache, and parallel prefill—to achieve high performance on the CPU backend, making it well-suited for mobile deployment.
         | 
| 15 |  | 
| 16 | 
            +
            We provide the [.pte file](https://huggingface.co/pytorch/SmolLM3-3B-8da4w/blob/main/smollm3-3b-8da4w.pte) for direct use in ExecuTorch. *(The provided pte file is exported with the default max_seq_length/max_context_length of 2k.)*
         | 
|  | |
| 17 |  | 
| 18 | 
            +
            # Running in a mobile app
         | 
| 19 | 
            +
            The [.pte file](https://huggingface.co/pytorch/Phi-4-mini-instruct-8da4w/blob/main/phi4-mini-8da4w.pte) can be run with ExecuTorch on a mobile phone. See the instructions for doing this in [iOS](https://pytorch.org/executorch/main/llm/llama-demo-ios.html) and [Android](https://docs.pytorch.org/executorch/main/llm/llama-demo-android.html).
         | 
| 20 |  | 
| 21 | 
            +
            On Google's Pixel 8 Pro, the model runs at 12.7 tokens/s.
         | 
| 22 |  | 
| 23 | 
            +
            # Running with ExecuTorch’s sample runner
         | 
| 24 | 
            +
            You can also run this model using ExecuTorch’s sample runner following [Step 3&4 in this instruction](https://github.com/pytorch/executorch/blob/main/examples/models/llama/README.md#step-3-run-on-your-computer-to-validate)
         | 
| 25 | 
            +
             | 
| 26 | 
            +
             | 
| 27 | 
            +
            # Export Recipe
         | 
| 28 | 
            +
            You can re-create the `.pte` file from eager source using this export recipe.
         | 
|  | |
|  | |
|  | |
| 29 |  | 
| 30 | 
            +
            First install `optimum-executorch` by following this [instruction](https://github.com/huggingface/optimum-executorch?tab=readme-ov-file#-quick-installation), then you can use `optimum-cli` to export the model to ExecuTorch:
         | 
| 31 | 
             
            ```Shell
         | 
| 32 | 
             
            optimum-cli export executorch \
         | 
| 33 | 
             
              --model HuggingFaceTB/SmolLM3-3B \
         | 

