davidfred commited on
Commit
00d41f8
·
verified ·
1 Parent(s): 32630a6

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -0
README.md ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - gguf
7
+ - quantized
8
+ - gpt-oss
9
+ - llama-cpp
10
+ - mixture-of-experts
11
+ - f16
12
+ model_type: gpt-oss
13
+ ---
14
+
15
+ # GPT-OSS 20B F16 GGUF
16
+
17
+ This is a high-quality F16 GGUF conversion of the GPT-OSS 20B model, optimized for llama.cpp inference.
18
+
19
+ ## Model Details
20
+
21
+ - **Model Type**: GPT-OSS (Mixture of Experts)
22
+ - **Parameters**: 20.91B total, 1.8B active
23
+ - **Precision**: F16 (16-bit floating point)
24
+ - **File Size**: 12.83 GiB
25
+ - **Context Length**: 131,072 tokens
26
+ - **Experts**: 32 total, 4 active per token
27
+ - **Architecture**: Sliding window attention with expert routing
28
+
29
+ ## Key Features
30
+
31
+ - ✅ **Fully Functional**: All 459 tensors intact, including mxfp4 expert weights
32
+ - ✅ **High Quality**: F16 precision maintains model performance
33
+ - ✅ **Complete MoE Support**: Expert routing and gating fully preserved
34
+ - ✅ **Extended Context**: 131K token context window with YARN scaling
35
+
36
+ ## Usage with llama.cpp
37
+
38
+ ```bash
39
+ # Download the model
40
+ huggingface-cli download davidfred/gpt-oss-20b-f16-gguf gpt-oss-20B-F16.gguf