ONNX Format
Hi, I was wondering if anyone has already converted the safetensors into a format like ONNX (preferably a quantized version). I’d like to run it on edge devices. Thanks! ✌️
Doesnt seem to be supported by optimum cli yet
https://huggingface.co/docs/transformers/en/serialization
/home/Alphag0/Documents/TestProjects/onnx/.venv/lib/python3.11/site-packages/torch/onnx/_internal/registration.py:159: OnnxExporterWarning: Symbolic function 'aten::scaled_dot_product_attention' already registered for opset 14. Replacing the existing function with new function. This is unexpected. Please report it on https://github.com/pytorch/pytorch/issues.
warnings.warn(
MXFP4 quantization requires triton >= 3.4.0 and triton_kernels installed, we will default to dequantizing the model to bf16
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:13<00:26, 13.19s/it]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:25<00:12, 12.92s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:31<00:00, 9.36s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:31<00:00, 10.35s/it]
Traceback (most recent call last):
File "/home/Alphag0/Documents/TestProjects/onnx/.venv/bin/optimum-cli", line 7, in <module>
sys.exit(main())
^^^^^^
File "/home/Alphag0/Documents/TestProjects/onnx/.venv/lib/python3.11/site-packages/optimum/commands/optimum_cli.py", line 208, in main
service.run()
File "/home/Alphag0/Documents/TestProjects/onnx/.venv/lib/python3.11/site-packages/optimum/commands/export/onnx.py", line 276, in run
main_export(
File "/home/Alphag0/Documents/TestProjects/onnx/.venv/lib/python3.11/site-packages/optimum/exporters/onnx/__main__.py", line 418, in main_export
onnx_export_from_model(
File "/home/Alphag0/Documents/TestProjects/onnx/.venv/lib/python3.11/site-packages/optimum/exporters/onnx/convert.py", line 1044, in onnx_export_from_model
raise ValueError(
ValueError: Trying to export a gpt_oss model, that is a custom or unsupported architecture, but no custom onnx configuration was passed as `custom_onnx_configs`. Please refer to https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/export_a_model#custom-export-of-transformers-models for an example on how to export custom models. Please open an issue at https://github.com/huggingface/optimum/issues if you would like the model type gpt_oss to be supported natively in the ONNX export.
There is an optimized and quantized ONNX model, and it is available through Foundry Local and AI Toolkit for VS Code. Please see the official OpenAI announcement for more details. I have also uploaded the model to Hugging Face here.
There is an optimized and quantized ONNX model, and it is available through Foundry Local and AI Toolkit for VS Code. Please see the official OpenAI announcement for more details. I have also uploaded the model to Hugging Face here.
any chance of getting the 120b model in onnx, too?
any chance of getting the 120b model in onnx, too?
You can create your own ONNX variants for gpt-oss-20b
and gpt-oss-120b
with ONNX Runtime GenAI's model builder. Here is the PR to enable that.
I run this, but getting an error: python builder.py -i d:\models\gpt-oss-120b -o d:\models\gpt-oss-120b-onnx -p int4 -e dml --extra_options int4_op_types_to_quantize=MatMul/Gather
Reading final norm
Reading LM head
Saving ONNX model in d:\models\gpt-oss-120b-onnx
Traceback (most recent call last):
File "C:\projects\onnxruntime-genai\venv\Lib\site-packages\onnx_ir\serde.py", line 99, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\projects\onnxruntime-genai\venv\Lib\site-packages\onnx_ir\serde.py", line 1694, in serialize_tensor_into
tensor_proto.raw_data = from_.tobytes()
^^^^^^^^^^^^^^^
File "C:\projects\onnxruntime-genai\venv\Lib\site-packages\onnx_ir_core.py", line 970, in tobytes
return self._evaluate().tobytes()
^^^^^^^^^^^^^^^^
File "C:\projects\onnxruntime-genai\venv\Lib\site-packages\onnx_ir_core.py", line 931, in _evaluate
return self._func()
^^^^^^^^^^^^
File "C:\projects\onnxruntime-genai\src\python\py\models\builder.py", line 547, in tensor_func
tensor = tensor.to(to_torch_dtype(to))
^^^^^^^^^
AttributeError: 'Tensor' object has no attribute 'to'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\projects\onnxruntime-genai\venv\Lib\site-packages\onnx_ir\serde.py", line 99, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\projects\onnxruntime-genai\venv\Lib\site-packages\onnx_ir\serde.py", line 1489, in serialize_graph_into
serialize_tensor_into(graph_proto.initializer.add(), from_=value.const_value)
File "C:\projects\onnxruntime-genai\venv\Lib\site-packages\onnx_ir\serde.py", line 101, in wrapper
raise SerdeError(
onnx_ir.serde.SerdeError: Error calling serialize_tensor_into with: LazyTensor<FLOAT16,[128,2880,5760]>(func=<function Model.make_initializer..tensor_func at 0x0000016A47F0CE00>, name='model.layers.0.moe.experts.gate_up_proj.weight')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\projects\onnxruntime-genai\venv\Lib\site-packages\onnx_ir\serde.py", line 99, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\projects\onnxruntime-genai\venv\Lib\site-packages\onnx_ir\serde.py", line 1294, in serialize_model_into
serialize_graph_into(model_proto.graph, from_.graph)
File "C:\projects\onnxruntime-genai\venv\Lib\site-packages\onnx_ir\serde.py", line 101, in wrapper
raise SerdeError(
onnx_ir.serde.SerdeError: Error calling serialize_graph_into with: name=main_graph, doc_string=None, len(inputs)=75, len(initializers)=581, len(nodes)=1824, len(outputs)=73, metadata_props={}
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\projects\onnxruntime-genai\src\python\py\models\builder.py", line 4514, in
create_model(args.model_name, args.input, args.output, args.precision, args.execution_provider, args.cache_dir, **extra_options)
File "C:\projects\onnxruntime-genai\venv\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\projects\onnxruntime-genai\src\python\py\models\builder.py", line 4377, in create_model
onnx_model.save_model(output_dir)
File "C:\projects\onnxruntime-genai\src\python\py\models\builder.py", line 501, in save_model
model = self.to_int4()
^^^^^^^^^^^^^^
File "C:\projects\onnxruntime-genai\src\python\py\models\builder.py", line 484, in to_int4
model=ir.to_proto(self.model),
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\projects\onnxruntime-genai\venv\Lib\site-packages\onnx_ir\serde.py", line 276, in to_proto
return serialize_model(ir_object)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\projects\onnxruntime-genai\venv\Lib\site-packages\onnx_ir\serde.py", line 1266, in serialize_model
return serialize_model_into(onnx.ModelProto(), from_=model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\projects\onnxruntime-genai\venv\Lib\site-packages\onnx_ir\serde.py", line 101, in wrapper
raise SerdeError(
onnx_ir.serde.SerdeError: Error calling serialize_model_into with: ir_version=10, producer_name=onnxruntime-genai, producer_version=None, domain=None,