Qwen3-1.7B — Apple Core AI Export (iPhone GPU)

Pre-converted Apple Core AI (.aimodelc) bundle of Qwen/Qwen3-1.7B, produced with the coreai-models export recipe and presented without modification. Hashes are embedded so the artifact is a reproducible reference point.

This fills the missing 1.7B rung in the dense Qwen3 Core AI line (0.6b / 4b / 8b already exist) — and 1.7B is a meaningful rung: it is the largest dense Qwen3 that still invokes on LiteRT-LM iOS, measured alongside this bundle in a neutral cross-runtime benchmark.

Why GPU-only (no ANE bundle)

The 0.6B repo ships both an ANE (static-shape, palettized) and a GPU (dynamic INT4) bundle. At 1.7B the ANE export is omitted on purpose: the static-shape ANE bundle loads but fails to invoke on iOS 27 — a full benchmark run window produced no output — the same ANE invoke ceiling the 4B static export hits. Rather than ship a bundle that does not run, only the GPU (dynamic INT4) export is published here; it invokes and decodes cleanly on device. (Core AI's GPU path is unaffected — it runs 0.6B/1.7B/4B on iPhone; the ANE path is the one that tops out below 1.7B.)

Bundle

Path	Target	Compute unit	Quant	On disk
`ios-gpu/`	iPhone (h18p)	GPU (`coreai-pipelined`)	dynamic INT4	939 MB

Embedded tokenizer (Qwen/Qwen3-1.7B), 40960 max context. iOS bundles are already AOT-compiled (.aimodelc) for the iPhone 17 Pro GPU target.

Measured — iPhone 17 Pro (iPhone18,1 · iOS 27.0)

Greedy, 128-token budget for short-chat (n=3, iso-cold), 256 for quality. Every figure traces to raw JSONL in the companion benchmark.

Metric	Value
Decode	44.7 tok/s cold → ~66 warm
TTFT (warm)	~29 ms
Prefill	~750 tok/s
Peak RAM	248 MB
Quality (8 checkable Qs)	8 / 8, not degenerate

The kernel cache persists across launches, so only the first launch after a fresh install is genuinely cold (44.7); once primed it holds ~66 tok/s — the steady state a user actually sees, and the fastest of the runtimes measured at this size (vs MLX Q4 ~66 at 1095 MB, LiteRT-LM int8 ~30 at 512 MB).

Usage

iOS bundles are AOT-compiled and side-loaded into the app container; load via Core AI with the embedded tokenizer and coreai-pipelined GPU engine. See the coreai-models recipe and CoreAIChatMac for an interactive harness.

Provenance

Converted from Qwen/Qwen3-1.7B (Apache-2.0). Quant: dynamic INT4 (linear). Producer: coreai-build-3600.67.5.8.1.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for mlboydaisuke/qwen3-1.7b-CoreAI-official

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Finetuned

(833)

this model