Add/update the quantized ONNX model files and README.md for Transformers.js v3

## Applied Quantizations

### ❌ Based on `decoder_model.onnx` *with* slimming

```
0%| | 0/1 [00:00<?, ?it/s]
Processing /tmp/tmpd9dp6wxb/decoder_model.onnx: 0%| | 0/1 [00:00<?, ?it/s]

0%| | 0/7 [00:00<?, ?it/s][A

- Quantizing to fp16: 0%| | 0/7 [00:00<?, ?it/s][A/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:85: UserWarning: the float32 number -3.4028234663852886e+38 will be truncated to -10000.0
warnings.warn(

- Quantizing to fp16: 0%| | 0/7 [00:00<?, ?it/s]

Processing /tmp/tmpd9dp6wxb/decoder_model.onnx: 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 377, in <module>
main()
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 374, in main
quantize(input_folder, output_folder, quantization_args)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 309, in quantize
quantize_fp16(
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 217, in quantize_fp16
model_fp16 = float16.convert_float_to_float16(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py", line 273, in convert_float_to_float16
process_graph_output(curr_graph, is_top_level, keep_io_types)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py", line 372, in process_graph_output
assert len(upstream_nodes) == 1 # Should be only one node
^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
```

### ❌ Based on `decoder_model.onnx` *without* slimming

```
0%| | 0/1 [00:00<?, ?it/s]
Processing /tmp/tmpn390r9dn/decoder_model.onnx: 0%| | 0/1 [00:00<?, ?it/s]

0%| | 0/7 [00:00<?, ?it/s][A

- Quantizing to fp16: 0%| | 0/7 [00:00<?, ?it/s][A/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:85: UserWarning: the float32 number -3.4028234663852886e+38 will be truncated to -10000.0
warnings.warn(

- Quantizing to fp16: 0%| | 0/7 [00:00<?, ?it/s]

Processing /tmp/tmpn390r9dn/decoder_model.onnx: 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 377, in <module>
main()
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 374, in main
quantize(input_folder, output_folder, quantization_args)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 309, in quantize
quantize_fp16(
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 217, in quantize_fp16
model_fp16 = float16.convert_float_to_float16(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py", line 273, in convert_float_to_float16
process_graph_output(curr_graph, is_top_level, keep_io_types)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py", line 372, in process_graph_output
assert len(upstream_nodes) == 1 # Should be only one node
^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
```

### ❌ Based on `encoder_model.onnx` *with* slimming

```
None
```
↳ ✅ `fp16`: `encoder_model_fp16.onnx` (added)
↳ ✅ `q8`: `encoder_model_quantized.onnx` (added)
↳ ❌ `int8`: `encoder_model_int8.onnx` (added but JS-based E2E test failed)
```
dtype not specified for "decoder_model_merged". Using the default dtype (fp32) for this device (cpu).
/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:25
__classPrivateFieldGet(this, _OnnxruntimeSessionHandler_inferenceSession, "f").loadModel(pathOrBuffer, options);
^

Error: Could not find an implementation for ConvInteger(10) node with name '/conv1/Conv_quant'
at new OnnxruntimeSessionHandler (/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:25:92)
at Immediate.<anonymous> (/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:67:29)
at process.processImmediate (node:internal/timers:485:21)

Node.js v22.16.0
```
↳ ✅ `uint8`: `encoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `encoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `encoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `encoder_model_bnb4.onnx` (added)

### ❌ Based on `decoder_with_past_model.onnx` *with* slimming

```
0%| | 0/1 [00:00<?, ?it/s]
Processing /tmp/tmpc36deqoq/decoder_with_past_model.onnx: 0%| | 0/1 [00:00<?, ?it/s]

0%| | 0/7 [00:00<?, ?it/s][A

- Quantizing to fp16: 0%| | 0/7 [00:00<?, ?it/s][A/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:85: UserWarning: the float32 number -3.4028234663852886e+38 will be truncated to -10000.0
warnings.warn(

- Quantizing to fp16: 0%| | 0/7 [00:00<?, ?it/s]

Processing /tmp/tmpc36deqoq/decoder_with_past_model.onnx: 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 377, in <module>
main()
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 374, in main
quantize(input_folder, output_folder, quantization_args)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 309, in quantize
quantize_fp16(
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 217, in quantize_fp16
model_fp16 = float16.convert_float_to_float16(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py", line 273, in convert_float_to_float16
process_graph_output(curr_graph, is_top_level, keep_io_types)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py", line 372, in process_graph_output
assert len(upstream_nodes) == 1 # Should be only one node
^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
```

### ❌ Based on `decoder_with_past_model.onnx` *without* slimming

```
0%| | 0/1 [00:00<?, ?it/s]
Processing /tmp/tmp11fmkou5/decoder_with_past_model.onnx: 0%| | 0/1 [00:00<?, ?it/s]

0%| | 0/7 [00:00<?, ?it/s][A

- Quantizing to fp16: 0%| | 0/7 [00:00<?, ?it/s][A/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:85: UserWarning: the float32 number -3.4028234663852886e+38 will be truncated to -10000.0
warnings.warn(

- Quantizing to fp16: 0%| | 0/7 [00:00<?, ?it/s]

Processing /tmp/tmp11fmkou5/decoder_with_past_model.onnx: 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 377, in <module>
main()
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 374, in main
quantize(input_folder, output_folder, quantization_args)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 309, in quantize
quantize_fp16(
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 217, in quantize_fp16
model_fp16 = float16.convert_float_to_float16(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py", line 273, in convert_float_to_float16
process_graph_output(curr_graph, is_top_level, keep_io_types)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py", line 372, in process_graph_output
assert len(upstream_nodes) == 1 # Should be only one node
^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
```

### ✅ Based on `decoder_model_merged.onnx` *with* slimming

↳ ✅ `fp16`: `decoder_model_merged_fp16.onnx` (added)
↳ ✅ `q8`: `decoder_model_merged_quantized.onnx` (added)
↳ ✅ `int8`: `decoder_model_merged_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_model_merged_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_model_merged_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_model_merged_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_model_merged_bnb4.onnx` (added)

Files changed (13) hide show

onnx/decoder_model_merged_bnb4.onnx +3 -0
onnx/decoder_model_merged_fp16.onnx +3 -0
onnx/decoder_model_merged_int8.onnx +3 -0
onnx/decoder_model_merged_q4.onnx +3 -0
onnx/decoder_model_merged_q4f16.onnx +3 -0
onnx/decoder_model_merged_quantized.onnx +3 -0
onnx/decoder_model_merged_uint8.onnx +3 -0
onnx/encoder_model_bnb4.onnx +3 -0
onnx/encoder_model_fp16.onnx +3 -0
onnx/encoder_model_q4.onnx +3 -0
onnx/encoder_model_q4f16.onnx +3 -0
onnx/encoder_model_quantized.onnx +3 -0
onnx/encoder_model_uint8.onnx +3 -0

+version https://git-lfs.github.com/spec/v1
+oid sha256:e9efe7b6d2fcebb789ef507ddab70073886b046e5df9b59d62a17a1d63087e9b
+size 3505576

+version https://git-lfs.github.com/spec/v1
+oid sha256:e3a07717558c26e2d541c9fa73825a5cec0b076ac948edd2d6d9c66a3663c3a5
+size 1900027

+version https://git-lfs.github.com/spec/v1
+oid sha256:85f676f67b06bba2e0d142b2d4f821920382329b285d35ec8b23ce36257191a5
+size 4380094

+version https://git-lfs.github.com/spec/v1
+oid sha256:4c7f5f75f6ea3d72a2303b1527911e5a2d71d2f7d3c76410754f3b935a05b282
+size 3525249

+version https://git-lfs.github.com/spec/v1
+oid sha256:e95914a213a2aebaf7b1eaf45e614335321f63aa2292268339fbbbbd15c2f898
+size 1844139

+version https://git-lfs.github.com/spec/v1
+oid sha256:9c84424e2cee24597afd7e362e43c02747d1782bec06b71f477543f2259092b4
+size 4380098

+version https://git-lfs.github.com/spec/v1
+oid sha256:a2e9afd5c068b30084262bc4399415216f34dae387ba55589d409b899758566a
+size 164073

+version https://git-lfs.github.com/spec/v1
+oid sha256:bdf73761fc49b24f437022e55adec9c1e89813a77c159ca497c7c88e27bf4621
+size 175141

+version https://git-lfs.github.com/spec/v1
+oid sha256:4495d6476d144ede07065cbbd9f8f7a109f08e9ec5c044a08f33cfe9551a30da
+size 183164

+version https://git-lfs.github.com/spec/v1
+oid sha256:8527469d7e5f87cac0bab3e62597b7f3184f9ef2a1a255e8ebc5f10b4cb1c0fa
+size 118285

	@@ -0,0 +1,3 @@

	@@ -0,0 +1,3 @@

	@@ -0,0 +1,3 @@

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:abdbb9cd2803830817e20e2760f5bd52905b85db8dcecbb2a6a861de14088337
+size 179449