metascroy commited on
Commit
8836ab9
·
verified ·
1 Parent(s): 8ee4bb2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -3
README.md CHANGED
@@ -209,14 +209,16 @@ lm_eval --model hf --model_args pretrained=pytorch/Qwen3-4B-INT8-INT4 --tasks mm
209
  We can run the quantized model on a mobile phone using [ExecuTorch](https://github.com/pytorch/executorch).
210
  Once ExecuTorch is [set-up](https://pytorch.org/executorch/main/getting-started.html), exporting and running the model on device is a breeze.
211
 
212
- We first convert the [quantized checkpoint](https://huggingface.co/pytorch/Qwen3-4B-INT8-INT4/blob/main/pytorch_model.bin) to one ExecuTorch's LLM export script expects by renaming some of the checkpoint keys.
 
213
  The following script does this for you.
214
  ```Shell
215
  python -m executorch.examples.models.qwen3.convert_weights $(hf download pytorch/Qwen3-4B-INT8-INT4) pytorch_model_converted.bin
216
  ```
217
 
218
- Once the checkpoint is converted, we can export to ExecuTorch's pte format with the XNNPACK delegate.
219
- The below command exports with a max_seq_length/max_context_length of 1024, but it can be changed as desired.
 
220
 
221
  ```Shell
222
  python -m executorch.examples.models.llama.export_llama \
@@ -236,6 +238,8 @@ python -m executorch.examples.models.llama.export_llama \
236
 
237
  After that you can run the model in a mobile app (see [Running in a mobile app](#running-in-a-mobile-app)).
238
 
 
 
239
  # Paper: TorchAO: PyTorch-Native Training-to-Serving Model Optimization
240
  The model's quantization is powered by **TorchAO**, a framework presented in the paper [TorchAO: PyTorch-Native Training-to-Serving Model Optimization](https://huggingface.co/papers/2507.16099).
241
 
 
209
  We can run the quantized model on a mobile phone using [ExecuTorch](https://github.com/pytorch/executorch).
210
  Once ExecuTorch is [set-up](https://pytorch.org/executorch/main/getting-started.html), exporting and running the model on device is a breeze.
211
 
212
+ ExecuTorch's LLM export scripts require the checkpoint keys and parameters have certain names, which differ from those used in Hugging Face.
213
+ So we first use a script that converts the Hugging Face checkpoint key names to ones that ExecuTorch expects:
214
  The following script does this for you.
215
  ```Shell
216
  python -m executorch.examples.models.qwen3.convert_weights $(hf download pytorch/Qwen3-4B-INT8-INT4) pytorch_model_converted.bin
217
  ```
218
 
219
+ Once we have the checkpoint, we export it to ExecuTorch with a max_seq_length/max_context_length of 1024 to the XNNPACK backend as follows.
220
+
221
+ (Note: ExecuTorch LLM export script requires config.json have certain key names. The correct config to use for the LLM export script is located at examples/models/qwen3/config/4b_config.json within the ExecuTorch repo.)
222
 
223
  ```Shell
224
  python -m executorch.examples.models.llama.export_llama \
 
238
 
239
  After that you can run the model in a mobile app (see [Running in a mobile app](#running-in-a-mobile-app)).
240
 
241
+ (We try to keep these instructions up-to-date, but if you find they do not work, check out our [CI test in ExecuTorch](https://github.com/pytorch/executorch/blob/main/.ci/scripts/test_torchao_huggingface_checkpoints.sh) for the latest source of truth, and let us know we need to update our model card.)
242
+
243
  # Paper: TorchAO: PyTorch-Native Training-to-Serving Model Optimization
244
  The model's quantization is powered by **TorchAO**, a framework presented in the paper [TorchAO: PyTorch-Native Training-to-Serving Model Optimization](https://huggingface.co/papers/2507.16099).
245