metadata

datasets:
  - GetSoloTech/Code-Reasoning
language:
  - en
base_model:
  - GetSoloTech/GPT-OSS-Code-Reasoning-20B
pipeline_tag: text-generation
tags:
  - coding
  - reasoning
  - problem-solving
  - algorithms
  - python
  - c++

GPT-OSS-Code-Reasoning-20B-GGUF

This is the GGUF quantized version of the GPT-OSS-Code-Reasoning-20B model, optimized for efficient inference with reduced memory requirements.

Overview

Base model: openai/gpt-oss-20b
Objective: Supervised fine-tuning for competitive programming and algorithmic reasoning
Format: GGUF (optimized for llama.cpp and compatible inference engines)

Model Variants

This GGUF model is available in multiple quantization levels to suit different hardware requirements:

Quantization	Size	Memory Usage	Quality
Q3_K_M	12.9 GB	~13 GB	Average
Q4_K_M	15.8 GB	~16 GB	Good
Q5_K_M	16.9 GB	~17 GB	Better
Q8_0	22.3 GB	~23 GB	Best

Intended Use

Intended: Generating Python/C++ solutions and reasoning for competitive programming tasks
Out of scope: Safety-critical applications. May hallucinate or produce incorrect/inefficient code

Quick Start

Using llama.cpp

# Download the model
wget https://huggingface.co/GetSoloTech/GPT-OSS-Code-Reasoning-20B-GGUF/resolve/main/gpt-oss-code-reasoning-20b.Q4_K_M.gguf

# Run inference
./llama.cpp -m gpt-oss-code-reasoning-20b.Q4_K_M.gguf -n 512 --repeat_penalty 1.1

Using Python with llama-cpp-python

from llama_cpp import Llama

# Load the model
llm = Llama(
    model_path="./gpt-oss-code-reasoning-20b.Q4_K_M.gguf",
    n_ctx=4096,
    n_threads=8
)

# Example problem
problem_text = """
You are given an array of integers nums and an integer target. 
Return indices of the two numbers such that they add up to target.
"""

# Create the prompt
prompt = f"""<|im_start|>system
You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.
<|im_end|>
<|im_start|>user
{problem_text}
<|im_end|>
<|im_start|>assistant
"""

# Generate response
output = llm(
    prompt,
    max_tokens=768,
    temperature=0.3,
    top_p=0.9,
    repeat_penalty=1.1,
    stop=["<|im_end|>"]
)

print(output['choices'][0]['text'])

Using Ollama

# Create a Modelfile
cat > Modelfile << EOF
FROM ./gpt-oss-code-reasoning-20b.Q4_K_M.gguf
TEMPLATE """<|im_start|>system
{{ .System }}
<|im_end|>
<|im_start|>user
{{ .Prompt }}
<|im_end|>
<|im_start|>assistant
"""
PARAMETER temperature 0.3
PARAMETER top_p 0.9
PARAMETER repeat_penalty 1.1
EOF

# Create and run the model
ollama create code-reasoning -f Modelfile
ollama run code-reasoning "Solve this competitive programming problem: [your problem here]"

Prompt Format

This model was trained in a chat format. Recommended structure:

messages = [
    {"role": "system", "content": "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful."},
    {"role": "user", "content": problem_text},
]

For GGUF models, use the following format:

<|im_start|>system
You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.
<|im_end|>
<|im_start|>user
{problem_text}
<|im_end|>
<|im_start|>assistant

Generation Tips

Reasoning style: Lower temperature (0.2–0.5) for clearer step-by-step reasoning
Length: Use max_tokens 512–1024 for full solutions; shorter for hints
Stop tokens: The model uses <|im_end|> as a stop token
Memory optimization: Choose the appropriate quantization level based on your hardware

Hardware Requirements

Quantization	Minimum RAM	Recommended RAM	GPU VRAM
Q3_K_M	8 GB	16 GB	8 GB
Q4_K_M	12 GB	24 GB	12 GB
Q5_K_M	16 GB	32 GB	16 GB
Q8_0	24 GB	48 GB	24 GB

Performance Notes

Speed: GGUF models are optimized for fast inference
Memory: Significantly reduced memory footprint compared to the original model
Quality: Minimal quality loss with appropriate quantization levels
Compatibility: Works with llama.cpp, llama-cpp-python, Ollama, and other GGUF-compatible engines

Acknowledgements

Original model: GetSoloTech/GPT-OSS-Code-Reasoning-20B
Base model: openai/gpt-oss-20b
Dataset: nvidia/OpenCodeReasoning-2
Upstream benchmarks: TACO, APPS, DeepMind CodeContests, open-r1/codeforces