SmolLM3-3B-INT8-INT4 / README.md

guangy10

Update README.md

de19084 verified 5 months ago

preview code

raw

history blame

1.72 kB

metadata

license: apache-2.0
base_model:
  - HuggingFaceTB/SmolLM3-3B
pipeline_tag: text-generation
library_name: optimum-executorch
tags:
  - executorch
  - transformers
  - optimum-executorch
  - smollm

Run on-device with ExecuTorch

This optimized model is exported to ExecuTorch and can run on edge devices. Once ExecuTorch is set-up, you can directly download the *.pte and tokenizer file and run the model in a mobile app (see Running in a mobile app).

Export to ExecuTorch

First need to install the required packages:

pip install git+https://github.com/huggingface/optimum-executorch@main

Then update the dependencies to latest in order to work on the SmolLM3-3B:

python install_dev.py

Use optimum-cli to export the model to ExecuTorch:

optimum-cli export executorch \
  --model HuggingFaceTB/SmolLM3-3B \
  --task text-generation \
  --recipe xnnpack \
  --use_custom_sdpa \
  --use_custom_kv_cache \
  --qlinear \
  --qembedding \
  --output_dir ./smollm3_3b

Disclaimer

PyTorch has not performed safety evaluations or red teamed the quantized models. Performance characteristics, outputs, and behaviors may differ from the original models. Users are solely responsible for selecting appropriate use cases, evaluating and mitigating for accuracy, safety, and fairness, ensuring security, and complying with all applicable laws and regulations.

Nothing contained in this Model Card should be interpreted as or deemed a restriction or modification to the licenses the models are released under, including any limitations of liability or disclaimers of warranties provided therein.