inferencerlabs
/

openai-gpt-oss-120b-MLX-6.5bit

Text Generation

Model card Files Files and versions

openai-gpt-oss-120b-MLX-6.5bit / README.md

inferencerlabs's picture

Upload complete model

c3620d0 verified 9 months ago

|

838 Bytes

	---
	license: apache-2.0
	pipeline_tag: text-generation
	library_name: mlx
	tags:
	- vllm
	- mlx
	base_model: openai/gpt-oss-120b
	---
	See gpt-oss-120b 6.5bit MLX in action - [demonstration video](https://youtube.com/xcreate)

	q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8 perplexity (1.128).
	\| Quantization \| Perplexity \|
	\|:------------:\|:----------:\|
	\| q2 \| 41.293 \|
	\| q3 \| 1.900 \|
	\| q4 \| 1.168 \|
	\| q6 \| 1.128 \|
	\| q8 \| 1.128 \|

	## Usage Notes

	* Built with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.26
	* Memory usage: ~95 GB
	* Expect ~60 tokens/s
	* For more details see [demonstration video](https://youtube.com/xcreate) or visit [Open AI gpt-oss-20b](https://huggingface.co/openai/gpt-oss-120b).