zeeshaan-ai commited on
Commit
b45c449
·
verified ·
1 Parent(s): 0fff7f6

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +174 -0
README.md ADDED
@@ -0,0 +1,174 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - GetSoloTech/Code-Reasoning
4
+ language:
5
+ - en
6
+ base_model:
7
+ - GetSoloTech/GPT-OSS-Code-Reasoning-20B
8
+ pipeline_tag: text-generation
9
+ tags:
10
+ - coding
11
+ - reasoning
12
+ - problem-solving
13
+ - algorithms
14
+ - python
15
+ - c++
16
+ ---
17
+
18
+ # GPT-OSS-Code-Reasoning-20B-GGUF
19
+
20
+ This is the GGUF quantized version of the [GPT-OSS-Code-Reasoning-20B](https://huggingface.co/GetSoloTech/GPT-OSS-Code-Reasoning-20B) model, optimized for efficient inference with reduced memory requirements.
21
+
22
+ ## Overview
23
+
24
+ - **Base model**: `openai/gpt-oss-20b`
25
+ - **Objective**: Supervised fine-tuning for competitive programming and algorithmic reasoning
26
+ - **Format**: GGUF (optimized for llama.cpp and compatible inference engines)
27
+
28
+ ## Model Variants
29
+
30
+ This GGUF model is available in multiple quantization levels to suit different hardware requirements:
31
+
32
+ | Quantization | Size | Memory Usage | Quality |
33
+ |--------------|------|--------------|---------|
34
+ | Q3_K_M | 12.9 GB | ~13 GB | Average |
35
+ | Q4_K_M | 15.8 GB | ~16 GB | Good |
36
+ | Q5_K_M | 16.9 GB | ~17 GB | Better |
37
+ | Q8_0 | 22.3 GB | ~23 GB | Best |
38
+
39
+ ## Intended Use
40
+
41
+ - **Intended**: Generating Python/C++ solutions and reasoning for competitive programming tasks
42
+ - **Out of scope**: Safety-critical applications. May hallucinate or produce incorrect/inefficient code
43
+
44
+ ## Quick Start
45
+
46
+ ### Using llama.cpp
47
+
48
+ ```bash
49
+ # Download the model
50
+ wget https://huggingface.co/GetSoloTech/GPT-OSS-Code-Reasoning-20B-GGUF/resolve/main/gpt-oss-code-reasoning-20b.Q4_K_M.gguf
51
+
52
+ # Run inference
53
+ ./llama.cpp -m gpt-oss-code-reasoning-20b.Q4_K_M.gguf -n 512 --repeat_penalty 1.1
54
+ ```
55
+
56
+ ### Using Python with llama-cpp-python
57
+
58
+ ```python
59
+ from llama_cpp import Llama
60
+
61
+ # Load the model
62
+ llm = Llama(
63
+ model_path="./gpt-oss-code-reasoning-20b.Q4_K_M.gguf",
64
+ n_ctx=4096,
65
+ n_threads=8
66
+ )
67
+
68
+ # Example problem
69
+ problem_text = """
70
+ You are given an array of integers nums and an integer target.
71
+ Return indices of the two numbers such that they add up to target.
72
+ """
73
+
74
+ # Create the prompt
75
+ prompt = f"""<|im_start|>system
76
+ You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.
77
+ <|im_end|>
78
+ <|im_start|>user
79
+ {problem_text}
80
+ <|im_end|>
81
+ <|im_start|>assistant
82
+ """
83
+
84
+ # Generate response
85
+ output = llm(
86
+ prompt,
87
+ max_tokens=768,
88
+ temperature=0.3,
89
+ top_p=0.9,
90
+ repeat_penalty=1.1,
91
+ stop=["<|im_end|>"]
92
+ )
93
+
94
+ print(output['choices'][0]['text'])
95
+ ```
96
+
97
+ ### Using Ollama
98
+
99
+ ```bash
100
+ # Create a Modelfile
101
+ cat > Modelfile << EOF
102
+ FROM ./gpt-oss-code-reasoning-20b.Q4_K_M.gguf
103
+ TEMPLATE """<|im_start|>system
104
+ {{ .System }}
105
+ <|im_end|>
106
+ <|im_start|>user
107
+ {{ .Prompt }}
108
+ <|im_end|>
109
+ <|im_start|>assistant
110
+ """
111
+ PARAMETER temperature 0.3
112
+ PARAMETER top_p 0.9
113
+ PARAMETER repeat_penalty 1.1
114
+ EOF
115
+
116
+ # Create and run the model
117
+ ollama create code-reasoning -f Modelfile
118
+ ollama run code-reasoning "Solve this competitive programming problem: [your problem here]"
119
+ ```
120
+
121
+ ## Prompt Format
122
+
123
+ This model was trained in a chat format. Recommended structure:
124
+
125
+ ```python
126
+ messages = [
127
+ {"role": "system", "content": "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful."},
128
+ {"role": "user", "content": problem_text},
129
+ ]
130
+ ```
131
+
132
+ For GGUF models, use the following format:
133
+
134
+ ```
135
+ <|im_start|>system
136
+ You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.
137
+ <|im_end|>
138
+ <|im_start|>user
139
+ {problem_text}
140
+ <|im_end|>
141
+ <|im_start|>assistant
142
+ ```
143
+
144
+ ## Generation Tips
145
+
146
+ - **Reasoning style**: Lower temperature (0.2–0.5) for clearer step-by-step reasoning
147
+ - **Length**: Use `max_tokens` 512–1024 for full solutions; shorter for hints
148
+ - **Stop tokens**: The model uses `<|im_end|>` as a stop token
149
+ - **Memory optimization**: Choose the appropriate quantization level based on your hardware
150
+
151
+ ## Hardware Requirements
152
+
153
+ | Quantization | Minimum RAM | Recommended RAM | GPU VRAM |
154
+ |--------------|-------------|-----------------|----------|
155
+ | Q3_K_M | 8 GB | 16 GB | 8 GB |
156
+ | Q4_K_M | 12 GB | 24 GB | 12 GB |
157
+ | Q5_K_M | 16 GB | 32 GB | 16 GB |
158
+ | Q8_0 | 24 GB | 48 GB | 24 GB |
159
+
160
+ ## Performance Notes
161
+
162
+ - **Speed**: GGUF models are optimized for fast inference
163
+ - **Memory**: Significantly reduced memory footprint compared to the original model
164
+ - **Quality**: Minimal quality loss with appropriate quantization levels
165
+ - **Compatibility**: Works with llama.cpp, llama-cpp-python, Ollama, and other GGUF-compatible engines
166
+
167
+
168
+ ## Acknowledgements
169
+
170
+ - Original model: [GetSoloTech/GPT-OSS-Code-Reasoning-20B](https://huggingface.co/GetSoloTech/GPT-OSS-Code-Reasoning-20B)
171
+ - Base model: `openai/gpt-oss-20b`
172
+ - Dataset: `nvidia/OpenCodeReasoning-2`
173
+ - Upstream benchmarks: TACO, APPS, DeepMind CodeContests, `open-r1/codeforces`
174
+