M1NDB0T — GPT-OSS-20B (GGUF Quants)

Quantized GGUF builds of openai/gpt-oss-20b for local inference with llama.cpp.
This repo currently ships Q4_K_M (speed/size sweet spot) and Q8_0 (higher-fidelity).

Drop it into your local stack, ask it weird questions, and let it dream in neon.


Model Details

Model Description

  • Base model: openai/gpt-oss-20b
  • This repo: Conversion to GGUF + quantization for llama.cpp inference.
  • Quantizations provided: Q4_K_M, Q8_0
  • Languages: Primarily English
  • License: Apache-2.0 (inherits base model’s license)
  • Finetuned from: None here (this repo is a conversion/quant pack)

Model Sources


Downloads

File Quant Link
Model-32x2.4B-Q4_K_M.gguf Q4_K_M Direct
Model-32x2.4B-Q8_0.gguf Q8_0 Direct

Tip: Relative links work once you’ve pushed the files. If you prefer absolute: https://huggingface.co/<user>/<repo>/resolve/main/<filename>


Quick Start

llama.cpp (CLI)

# Q4_K_M (small & fast)
./llama-cli -m Model-32x2.4B-Q4_K_M.gguf -cnv -p "Explain quantum tunneling like I'm 12."

# Q8_0 (larger & crisper)
./llama-cli -m Model-32x2.4B-Q8_0.gguf -cnv -p "Give 3 taglines for a cyberpunk AI named MindBot."
llama-cpp-python
python
Copy
Edit
from llama_cpp import Llama
llm = Llama(model_path="Model-32x2.4B-Q4_K_M.gguf")
out = llm("Write a one-sentence mission statement for MindBot.")["choices"][0]["text"]
print(out.strip())
Uses
Direct Use
Local chat / instruction following

Prototyping small agents (add your own tool-calling layer)

RAG assistants and creative ideation

Downstream Use
If you finetune the base HF weights, you can convert that to GGUF and drop it here too.

Great fit for LoRA/QLoRA style domain adapters before conversion.

Out-of-Scope
Safety-critical decisions (medical, legal, bio/chem, cyber offense) without human review + guardrails.

Bias, Risks, and Limitations
May hallucinate or reflect training biases. Keep a human-in-the-loop for high-stakes tasks.

Don’t expose raw chain-of-thought to end users.

Add content filtering aligned to your domain.

Recommendations: Log prompts/outputs, document known failure cases, and evaluate regularly.

How to Reproduce (Quantization)
bash
Copy
Edit
# Build llama.cpp tools
cd /workspace/llama.cpp
cmake -S . -B build && cmake --build build -j$(nproc) --target llama-quantize

# Quantize from your FP16 GGUF
./build/bin/llama-quantize /workspace/model/Model-32x2.4B-F16.gguf Model-32x2.4B-Q4_K_M.gguf q4_k_m
./build/bin/llama-quantize /workspace/model/Model-32x2.4B-F16.gguf Model-32x2.4B-Q8_0.gguf  q8_0
Optional quality boost: use an importance matrix via --imatrix imatrix.gguf.

Training Details
Training Data
Referenced dataset (creative/“dreaming” work):
TheMindExpansionNetwork/SYNERGETIC_COGNITION_V1
Community article: MindBot Ultra – Dreaming Edition: Enhanced Dataset and Training Blueprint
(This GGUF release is a conversion/quantization of the base model; no new finetune is bundled here unless you add it.)

Training Procedure
Not applicable for this repo (no additional training performed).

If you publish a finetune later, document preprocessing, splits, and hyperparams here.

Speeds, Sizes, Times (fill in after you test)
Q4_K_M: size ~X GB, prompt t/s ≈ Y, gen t/s ≈ Z (your hardware)

Q8_0: size ~X GB, prompt t/s ≈ Y, gen t/s ≈ Z

Evaluation (optional)
Add your benchmarks when ready (MT-Bench, MMLU, GSM8K, HumanEval, plus your domain tasks).
Compare Q4_K_M vs Q8_0 to guide users.

Environmental Impact
This repo: CPU/GPU hours for conversion and quantization only (low).

Base model pretraining: See the base model’s card for details if/when disclosed.

Technical Specs
Architecture & Objective
Decoder-only transformer (20B class) converted to GGUF for llama.cpp inference.

Compute & Software
Inference: llama.cpp, llama-cpp-python, LM Studio (GGUF), etc.

Citation
Base model:
OpenAI — gpt-oss-20b (Apache-2.0).

This repo (quantization pack):
TheMindExpansionNetwork / M1NDB0T — M1NDB0T — GPT-OSS-20B (GGUF Quants).

Glossary
GGUF: Binary format optimized for fast local inference via llama.cpp.

Q4_K_M / Q8_0: Popular llama.cpp quant presets; Q4_K_M balances speed/quality, Q8_0 is closer to FP16.

More Information
Base model: https://huggingface.co/openai/gpt-oss-20b

Dataset reference: https://huggingface.co/datasets/TheMindExpansionNetwork/SYNERGETIC_COGNITION_V1

Article: MindBot Ultra – Dreaming Edition: Enhanced Dataset and Training Blueprint

Model Card Authors
M1NDB0T

Contact
Open an Issue in the Community tab or add your contact here.
Downloads last month
826
GGUF
Model size
20.9B params
Architecture
gpt-oss
Hardware compatibility
Log In to view the estimation

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TheMindExpansionNetwork/M1NDB0T-GPT-OSS-20B_GGUF

Base model

openai/gpt-oss-20b
Quantized
(79)
this model