davidfred
/

gpt-oss-20b-f16-gguf

Model card Files Files and versions

davidfred commited on Aug 7

Commit

00d41f8

·

verified ·

1 Parent(s): 32630a6

Create README.md

Files changed (1) hide show

README.md +40 -0

README.md ADDED Viewed

	@@ -0,0 +1,40 @@

+license: apache-2.0
+language:
+- en
+tags:
+- gguf
+- quantized
+- gpt-oss
+- llama-cpp
+- mixture-of-experts
+- f16
+model_type: gpt-oss
+---
+# GPT-OSS 20B F16 GGUF
+This is a high-quality F16 GGUF conversion of the GPT-OSS 20B model, optimized for llama.cpp inference.
+## Model Details
+- **Model Type**: GPT-OSS (Mixture of Experts)
+- **Parameters**: 20.91B total, 1.8B active
+- **Precision**: F16 (16-bit floating point)
+- **File Size**: 12.83 GiB
+- **Context Length**: 131,072 tokens
+- **Experts**: 32 total, 4 active per token
+- **Architecture**: Sliding window attention with expert routing
+## Key Features
+- ✅ **Fully Functional**: All 459 tensors intact, including mxfp4 expert weights
+- ✅ **High Quality**: F16 precision maintains model performance
+- ✅ **Complete MoE Support**: Expert routing and gating fully preserved
+- ✅ **Extended Context**: 131K token context window with YARN scaling
+## Usage with llama.cpp
+```bash
+# Download the model
+huggingface-cli download davidfred/gpt-oss-20b-f16-gguf gpt-oss-20B-F16.gguf