M1NDB0T — GPT-OSS-20B (GGUF Quants)
Quantized GGUF builds of openai/gpt-oss-20b
for local inference with llama.cpp
.
This repo currently ships Q4_K_M (speed/size sweet spot) and Q8_0 (higher-fidelity).
⚡ Drop it into your local stack, ask it weird questions, and let it dream in neon.
Model Details
Model Description
- Base model:
openai/gpt-oss-20b
- This repo: Conversion to GGUF + quantization for
llama.cpp
inference. - Quantizations provided:
Q4_K_M
,Q8_0
- Languages: Primarily English
- License: Apache-2.0 (inherits base model’s license)
- Finetuned from: None here (this repo is a conversion/quant pack)
Model Sources
- Repository (this page): You’re here 🙂
- Base model: https://huggingface.co/openai/gpt-oss-20b
- Quantization tooling: https://github.com/ggerganov/llama.cpp
Downloads
Tip: Relative links work once you’ve pushed the files. If you prefer absolute:
https://huggingface.co/<user>/<repo>/resolve/main/<filename>
Quick Start
llama.cpp
(CLI)
# Q4_K_M (small & fast)
./llama-cli -m Model-32x2.4B-Q4_K_M.gguf -cnv -p "Explain quantum tunneling like I'm 12."
# Q8_0 (larger & crisper)
./llama-cli -m Model-32x2.4B-Q8_0.gguf -cnv -p "Give 3 taglines for a cyberpunk AI named MindBot."
llama-cpp-python
python
Copy
Edit
from llama_cpp import Llama
llm = Llama(model_path="Model-32x2.4B-Q4_K_M.gguf")
out = llm("Write a one-sentence mission statement for MindBot.")["choices"][0]["text"]
print(out.strip())
Uses
Direct Use
Local chat / instruction following
Prototyping small agents (add your own tool-calling layer)
RAG assistants and creative ideation
Downstream Use
If you finetune the base HF weights, you can convert that to GGUF and drop it here too.
Great fit for LoRA/QLoRA style domain adapters before conversion.
Out-of-Scope
Safety-critical decisions (medical, legal, bio/chem, cyber offense) without human review + guardrails.
Bias, Risks, and Limitations
May hallucinate or reflect training biases. Keep a human-in-the-loop for high-stakes tasks.
Don’t expose raw chain-of-thought to end users.
Add content filtering aligned to your domain.
Recommendations: Log prompts/outputs, document known failure cases, and evaluate regularly.
How to Reproduce (Quantization)
bash
Copy
Edit
# Build llama.cpp tools
cd /workspace/llama.cpp
cmake -S . -B build && cmake --build build -j$(nproc) --target llama-quantize
# Quantize from your FP16 GGUF
./build/bin/llama-quantize /workspace/model/Model-32x2.4B-F16.gguf Model-32x2.4B-Q4_K_M.gguf q4_k_m
./build/bin/llama-quantize /workspace/model/Model-32x2.4B-F16.gguf Model-32x2.4B-Q8_0.gguf q8_0
Optional quality boost: use an importance matrix via --imatrix imatrix.gguf.
Training Details
Training Data
Referenced dataset (creative/“dreaming” work):
TheMindExpansionNetwork/SYNERGETIC_COGNITION_V1
Community article: MindBot Ultra – Dreaming Edition: Enhanced Dataset and Training Blueprint
(This GGUF release is a conversion/quantization of the base model; no new finetune is bundled here unless you add it.)
Training Procedure
Not applicable for this repo (no additional training performed).
If you publish a finetune later, document preprocessing, splits, and hyperparams here.
Speeds, Sizes, Times (fill in after you test)
Q4_K_M: size ~X GB, prompt t/s ≈ Y, gen t/s ≈ Z (your hardware)
Q8_0: size ~X GB, prompt t/s ≈ Y, gen t/s ≈ Z
Evaluation (optional)
Add your benchmarks when ready (MT-Bench, MMLU, GSM8K, HumanEval, plus your domain tasks).
Compare Q4_K_M vs Q8_0 to guide users.
Environmental Impact
This repo: CPU/GPU hours for conversion and quantization only (low).
Base model pretraining: See the base model’s card for details if/when disclosed.
Technical Specs
Architecture & Objective
Decoder-only transformer (20B class) converted to GGUF for llama.cpp inference.
Compute & Software
Inference: llama.cpp, llama-cpp-python, LM Studio (GGUF), etc.
Citation
Base model:
OpenAI — gpt-oss-20b (Apache-2.0).
This repo (quantization pack):
TheMindExpansionNetwork / M1NDB0T — M1NDB0T — GPT-OSS-20B (GGUF Quants).
Glossary
GGUF: Binary format optimized for fast local inference via llama.cpp.
Q4_K_M / Q8_0: Popular llama.cpp quant presets; Q4_K_M balances speed/quality, Q8_0 is closer to FP16.
More Information
Base model: https://huggingface.co/openai/gpt-oss-20b
Dataset reference: https://huggingface.co/datasets/TheMindExpansionNetwork/SYNERGETIC_COGNITION_V1
Article: MindBot Ultra – Dreaming Edition: Enhanced Dataset and Training Blueprint
Model Card Authors
M1NDB0T
Contact
Open an Issue in the Community tab or add your contact here.
- Downloads last month
- 826
Hardware compatibility
Log In
to view the estimation
4-bit
8-bit
16-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for TheMindExpansionNetwork/M1NDB0T-GPT-OSS-20B_GGUF
Base model
openai/gpt-oss-20b