Not able to load model with dotnet (using latest version of runtime)
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI.Cuda" Version="0.9.0" />
Issue with 'input size 12'
Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIException: 'Load model from E:\s\models\gpt-oss-20b-onnx\cuda\cuda-int4-kquant-block-32-mixed\model.onnx failed:This is an invalid model. In Node, ("/model/layers.0/attn/GroupQueryAttention", GroupQueryAttention, "com.microsoft", -1) : ("/model/layers.0/attn/qkv_proj/Add/output_0": tensor(float16),"","","past_key_values.0.key": tensor(float16),"past_key_values.0.value": tensor(float16),"/model/attn_mask_reformat/attn_mask_subgraph/Sub/Cast/output_0": tensor(int32),"/model/attn_mask_reformat/attn_mask_subgraph/Gather/Cast/output_0": tensor(int32),"cos_cache": tensor(float16),"sin_cache": tensor(float16),"","","model.layers.0.attn.sinks": tensor(float16),) -> ("/model/layers.0/attn/GroupQueryAttention/output_0": tensor(float16),"present.0.key": tensor(float16),"present.0.value": tensor(float16),) , Error Node(/model/layers.0/attn/GroupQueryAttention) with schema(com.microsoft::GroupQueryAttention:1) has input size 12 not in range [min=7, max=11].'
ok i see from a previous discussion that I need to get the nightly build. Let me figure that out
still seeing the same issue with the nightly version from:
https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/nuget/v3/index.json
<PackageReference Include="Microsoft.ML.OnnxRuntime.Gpu" Version="1.23.0-dev-20250429-1449-a9a3ad2e0c" />
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI.Cuda" Version="0.9.0" />
The date on the package shows as 'Tuesday, April 29, 2025 (4/29/2025)'. Maybe the build pipeline is broken?