Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,40 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
tags:
|
6 |
+
- gguf
|
7 |
+
- quantized
|
8 |
+
- gpt-oss
|
9 |
+
- llama-cpp
|
10 |
+
- mixture-of-experts
|
11 |
+
- f16
|
12 |
+
model_type: gpt-oss
|
13 |
+
---
|
14 |
+
|
15 |
+
# GPT-OSS 20B F16 GGUF
|
16 |
+
|
17 |
+
This is a high-quality F16 GGUF conversion of the GPT-OSS 20B model, optimized for llama.cpp inference.
|
18 |
+
|
19 |
+
## Model Details
|
20 |
+
|
21 |
+
- **Model Type**: GPT-OSS (Mixture of Experts)
|
22 |
+
- **Parameters**: 20.91B total, 1.8B active
|
23 |
+
- **Precision**: F16 (16-bit floating point)
|
24 |
+
- **File Size**: 12.83 GiB
|
25 |
+
- **Context Length**: 131,072 tokens
|
26 |
+
- **Experts**: 32 total, 4 active per token
|
27 |
+
- **Architecture**: Sliding window attention with expert routing
|
28 |
+
|
29 |
+
## Key Features
|
30 |
+
|
31 |
+
- ✅ **Fully Functional**: All 459 tensors intact, including mxfp4 expert weights
|
32 |
+
- ✅ **High Quality**: F16 precision maintains model performance
|
33 |
+
- ✅ **Complete MoE Support**: Expert routing and gating fully preserved
|
34 |
+
- ✅ **Extended Context**: 131K token context window with YARN scaling
|
35 |
+
|
36 |
+
## Usage with llama.cpp
|
37 |
+
|
38 |
+
```bash
|
39 |
+
# Download the model
|
40 |
+
huggingface-cli download davidfred/gpt-oss-20b-f16-gguf gpt-oss-20B-F16.gguf
|