zeeshaan-ai's picture
Create README.md
b45c449 verified
metadata
datasets:
  - GetSoloTech/Code-Reasoning
language:
  - en
base_model:
  - GetSoloTech/GPT-OSS-Code-Reasoning-20B
pipeline_tag: text-generation
tags:
  - coding
  - reasoning
  - problem-solving
  - algorithms
  - python
  - c++

GPT-OSS-Code-Reasoning-20B-GGUF

This is the GGUF quantized version of the GPT-OSS-Code-Reasoning-20B model, optimized for efficient inference with reduced memory requirements.

Overview

  • Base model: openai/gpt-oss-20b
  • Objective: Supervised fine-tuning for competitive programming and algorithmic reasoning
  • Format: GGUF (optimized for llama.cpp and compatible inference engines)

Model Variants

This GGUF model is available in multiple quantization levels to suit different hardware requirements:

Quantization Size Memory Usage Quality
Q3_K_M 12.9 GB ~13 GB Average
Q4_K_M 15.8 GB ~16 GB Good
Q5_K_M 16.9 GB ~17 GB Better
Q8_0 22.3 GB ~23 GB Best

Intended Use

  • Intended: Generating Python/C++ solutions and reasoning for competitive programming tasks
  • Out of scope: Safety-critical applications. May hallucinate or produce incorrect/inefficient code

Quick Start

Using llama.cpp

# Download the model
wget https://huggingface.co/GetSoloTech/GPT-OSS-Code-Reasoning-20B-GGUF/resolve/main/gpt-oss-code-reasoning-20b.Q4_K_M.gguf

# Run inference
./llama.cpp -m gpt-oss-code-reasoning-20b.Q4_K_M.gguf -n 512 --repeat_penalty 1.1

Using Python with llama-cpp-python

from llama_cpp import Llama

# Load the model
llm = Llama(
    model_path="./gpt-oss-code-reasoning-20b.Q4_K_M.gguf",
    n_ctx=4096,
    n_threads=8
)

# Example problem
problem_text = """
You are given an array of integers nums and an integer target. 
Return indices of the two numbers such that they add up to target.
"""

# Create the prompt
prompt = f"""<|im_start|>system
You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.
<|im_end|>
<|im_start|>user
{problem_text}
<|im_end|>
<|im_start|>assistant
"""

# Generate response
output = llm(
    prompt,
    max_tokens=768,
    temperature=0.3,
    top_p=0.9,
    repeat_penalty=1.1,
    stop=["<|im_end|>"]
)

print(output['choices'][0]['text'])

Using Ollama

# Create a Modelfile
cat > Modelfile << EOF
FROM ./gpt-oss-code-reasoning-20b.Q4_K_M.gguf
TEMPLATE """<|im_start|>system
{{ .System }}
<|im_end|>
<|im_start|>user
{{ .Prompt }}
<|im_end|>
<|im_start|>assistant
"""
PARAMETER temperature 0.3
PARAMETER top_p 0.9
PARAMETER repeat_penalty 1.1
EOF

# Create and run the model
ollama create code-reasoning -f Modelfile
ollama run code-reasoning "Solve this competitive programming problem: [your problem here]"

Prompt Format

This model was trained in a chat format. Recommended structure:

messages = [
    {"role": "system", "content": "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful."},
    {"role": "user", "content": problem_text},
]

For GGUF models, use the following format:

<|im_start|>system
You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.
<|im_end|>
<|im_start|>user
{problem_text}
<|im_end|>
<|im_start|>assistant

Generation Tips

  • Reasoning style: Lower temperature (0.2–0.5) for clearer step-by-step reasoning
  • Length: Use max_tokens 512–1024 for full solutions; shorter for hints
  • Stop tokens: The model uses <|im_end|> as a stop token
  • Memory optimization: Choose the appropriate quantization level based on your hardware

Hardware Requirements

Quantization Minimum RAM Recommended RAM GPU VRAM
Q3_K_M 8 GB 16 GB 8 GB
Q4_K_M 12 GB 24 GB 12 GB
Q5_K_M 16 GB 32 GB 16 GB
Q8_0 24 GB 48 GB 24 GB

Performance Notes

  • Speed: GGUF models are optimized for fast inference
  • Memory: Significantly reduced memory footprint compared to the original model
  • Quality: Minimal quality loss with appropriate quantization levels
  • Compatibility: Works with llama.cpp, llama-cpp-python, Ollama, and other GGUF-compatible engines

Acknowledgements

  • Original model: GetSoloTech/GPT-OSS-Code-Reasoning-20B
  • Base model: openai/gpt-oss-20b
  • Dataset: nvidia/OpenCodeReasoning-2
  • Upstream benchmarks: TACO, APPS, DeepMind CodeContests, open-r1/codeforces