Organization Card

Halley AI on Hugging Face

High-quality, Apple-Silicon–optimized MLX builds, tools, and evals — focused on practical, on-prem inference for small teams.

We publish Mixture-of-Experts (MoE) models and MLX quantizations tuned for M-series Macs (Metal + unified memory).
Target use: fast, reliable interactive chat and light batch workloads.

🚀 Featured models

Repo	Bits/GS	Footprint	Notes
halley-ai/gpt-oss-20b-MLX-4bit-gs32	Q4 / 32	~13.1 GB	Trades accuracy for footprint; use when RAM is constrained or throughput is the priority.
halley-ai/gpt-oss-20b-MLX-5bit-gs32	Q5 / 32	~15.8 GB	Small drop vs 6-bit/gs32 and 8-bit/gs64 (~3–6% PPL); “fits-16GB” VRAM when GPU buffer limits matter.
halley-ai/gpt-oss-20b-MLX-6bit-gs32	Q6 / 32	~18.4 GB	Best of the group; edges out 8-bit/gs64 slightly at a smaller footprint
Reference (8-bit)	Q8 / 32	—	See upstream: `lmstudio-community/gpt-oss-20b-MLX-8bit`

Format: MLX (not GGUF). For Linux/Windows or non-MLX stacks, use a GGUF build with llama.cpp.

models 3

datasets 0

None public yet

Halley AI

AI & ML interests

Recent Activity

Halley AI on Hugging Face

🚀 Featured models

models 3

halley-ai/gpt-oss-20b-MLX-5bit-gs32

halley-ai/gpt-oss-20b-MLX-6bit-gs32

halley-ai/gpt-oss-20b-MLX-4bit-gs32

datasets 0

AI & ML interests

Recent Activity

Team members 4

Halley AI on Hugging Face

🚀 Featured models

models 3 Sort: Recently updated

datasets 0

models 3