GetSoloTech
/

GPT-OSS-Code-Reasoning-20B-GGUF

+---
+datasets:
+- GetSoloTech/Code-Reasoning
+language:
+- en
+base_model:
+- GetSoloTech/GPT-OSS-Code-Reasoning-20B
+pipeline_tag: text-generation
+tags:
+- coding
+- reasoning
+- problem-solving
+- algorithms
+- python
+- c++
+---
+# GPT-OSS-Code-Reasoning-20B-GGUF
+This is the GGUF quantized version of the [GPT-OSS-Code-Reasoning-20B](https://huggingface.co/GetSoloTech/GPT-OSS-Code-Reasoning-20B) model, optimized for efficient inference with reduced memory requirements.
+## Overview
+- **Base model**: `openai/gpt-oss-20b`
+- **Objective**: Supervised fine-tuning for competitive programming and algorithmic reasoning
+- **Format**: GGUF (optimized for llama.cpp and compatible inference engines)
+## Model Variants
+This GGUF model is available in multiple quantization levels to suit different hardware requirements:
+| Quantization | Size | Memory Usage | Quality |
+|--------------|------|--------------|---------|
+| Q3_K_M       | 12.9 GB | ~13 GB | Average |
+| Q4_K_M       | 15.8 GB | ~16 GB | Good |
+| Q5_K_M       | 16.9 GB | ~17 GB | Better |
+| Q8_0         | 22.3 GB | ~23 GB | Best |
+## Intended Use
+- **Intended**: Generating Python/C++ solutions and reasoning for competitive programming tasks
+- **Out of scope**: Safety-critical applications. May hallucinate or produce incorrect/inefficient code
+## Quick Start
+### Using llama.cpp
+```bash
+# Download the model
+wget https://huggingface.co/GetSoloTech/GPT-OSS-Code-Reasoning-20B-GGUF/resolve/main/gpt-oss-code-reasoning-20b.Q4_K_M.gguf
+# Run inference
+./llama.cpp -m gpt-oss-code-reasoning-20b.Q4_K_M.gguf -n 512 --repeat_penalty 1.1
+```
+### Using Python with llama-cpp-python
+```python
+from llama_cpp import Llama
+# Load the model
+llm = Llama(
+    model_path="./gpt-oss-code-reasoning-20b.Q4_K_M.gguf",
+    n_ctx=4096,
+    n_threads=8
+)
+# Example problem
+problem_text = """
+You are given an array of integers nums and an integer target.
+Return indices of the two numbers such that they add up to target.
+"""
+# Create the prompt
+prompt = f"""<|im_start|>system
+You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.
+<|im_end|>
+<|im_start|>user
+{problem_text}
+<|im_end|>
+<|im_start|>assistant
+"""
+# Generate response
+output = llm(
+    prompt,
+    max_tokens=768,
+    temperature=0.3,
+    top_p=0.9,
+    repeat_penalty=1.1,
+    stop=["<|im_end|>"]
+)
+print(output['choices'][0]['text'])
+```
+### Using Ollama
+```bash
+# Create a Modelfile
+cat > Modelfile << EOF
+FROM ./gpt-oss-code-reasoning-20b.Q4_K_M.gguf
+TEMPLATE """<|im_start|>system
+{{ .System }}
+<|im_end|>
+<|im_start|>user
+{{ .Prompt }}
+<|im_end|>
+<|im_start|>assistant
+"""
+PARAMETER temperature 0.3
+PARAMETER top_p 0.9
+PARAMETER repeat_penalty 1.1
+EOF
+# Create and run the model
+ollama create code-reasoning -f Modelfile
+ollama run code-reasoning "Solve this competitive programming problem: [your problem here]"
+```
+## Prompt Format
+This model was trained in a chat format. Recommended structure:
+```python
+messages = [
+    {"role": "system", "content": "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful."},
+    {"role": "user", "content": problem_text},
+]
+```
+For GGUF models, use the following format:
+```
+<|im_start|>system
+You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.
+<|im_end|>
+<|im_start|>user
+{problem_text}
+<|im_end|>
+<|im_start|>assistant
+```
+## Generation Tips
+- **Reasoning style**: Lower temperature (0.2–0.5) for clearer step-by-step reasoning
+- **Length**: Use `max_tokens` 512–1024 for full solutions; shorter for hints
+- **Stop tokens**: The model uses `<|im_end|>` as a stop token
+- **Memory optimization**: Choose the appropriate quantization level based on your hardware
+## Hardware Requirements
+| Quantization | Minimum RAM | Recommended RAM | GPU VRAM |
+|--------------|-------------|-----------------|----------|
+| Q3_K_M       | 8 GB        | 16 GB           | 8 GB     |
+| Q4_K_M       | 12 GB       | 24 GB           | 12 GB    |
+| Q5_K_M       | 16 GB       | 32 GB           | 16 GB    |
+| Q8_0         | 24 GB       | 48 GB           | 24 GB    |
+## Performance Notes
+- **Speed**: GGUF models are optimized for fast inference
+- **Memory**: Significantly reduced memory footprint compared to the original model
+- **Quality**: Minimal quality loss with appropriate quantization levels
+- **Compatibility**: Works with llama.cpp, llama-cpp-python, Ollama, and other GGUF-compatible engines
+## Acknowledgements
+- Original model: [GetSoloTech/GPT-OSS-Code-Reasoning-20B](https://huggingface.co/GetSoloTech/GPT-OSS-Code-Reasoning-20B)
+- Base model: `openai/gpt-oss-20b`
+- Dataset: `nvidia/OpenCodeReasoning-2`
+- Upstream benchmarks: TACO, APPS, DeepMind CodeContests, `open-r1/codeforces`