File size: 2,480 Bytes
d0805f9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a71a80e
d0805f9
 
 
 
 
 
5018050
d0805f9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
---
tags:
- gguf
- quantized
- gpt-oss
- multilingual
- text-generation
- llama-cpp
- ollama
language:
- en
- es
- fr
- de
- it
- pt
license: apache-2.0
model_type: gpt-oss
pipeline_tag: text-generation
base_model: openai/gpt-oss-20b
---

# GPT-OSS-20B Function Calling GGUF

This repository contains the GPT-OSS-20B model fine-tuned on function calling data, converted to GGUF format for efficient inference with llama.cpp and Ollama.

## Model Details

- **Base Model:** openai/gpt-oss-20b
- **Fine-tuning Dataset:** Salesforce/xlam-function-calling-60k (2000 samples)
- **Fine-tuning Method:** LoRA (r=8, alpha=16)
- **Context Length:** 131,072 tokens
- **Model Size:** 20B parameters

## Files

- `gpt-oss-20b-function-calling-f16.gguf`: F16 precision model (best quality)
- `gpt-oss-20b-function-calling.Q4_K_M.gguf`: Q4_K_M quantized model (recommended for inference)

## Usage

### With Ollama (Recommended)

```bash
# Direct from Hugging Face
ollama run hf.co/cuijian0819/gpt-oss-20b-function-calling-gguf:Q4_K_M

# Or create local model
ollama create my-gpt-oss -f Modelfile
ollama run my-gpt-oss
```

### With llama.cpp

```bash
# Download model
wget https://huggingface.co/cuijian0819/gpt-oss-20b-function-calling-gguf/resolve/main/gpt-oss-20b-function-calling.Q4_K_M.gguf

# Run inference
./llama-cli -m gpt-oss-20b-function-calling.Q4_K_M.gguf -p "Your prompt here"
```

### Example Modelfile for Ollama

```dockerfile
FROM ./gpt-oss-20b-function-calling.Q4_K_M.gguf

TEMPLATE """<|start|>user<|message|>{{ .Prompt }}<|end|>
<|start|>assistant<|channel|>final<|message|>"""

PARAMETER temperature 0.7
PARAMETER top_p 0.9

SYSTEM """You are a helpful AI assistant that can call functions to help users."""
```

## PyTorch Version

For training and fine-tuning with PyTorch/Transformers, check out the PyTorch version: [cuijian0819/gpt-oss-20b-function-calling](https://huggingface.co/cuijian0819/gpt-oss-20b-function-calling)

## Performance

The Q4_K_M quantized version provides excellent performance:
- **Size Reduction:** ~62% smaller than F16
- **Memory Requirements:** ~16GB VRAM recommended
- **Quality:** Minimal degradation from quantization

## License

This model inherits the license from the base openai/gpt-oss-20b model.

## Citation

```bibtex
@misc{gpt-oss-20b-function-calling-gguf,
  title={GPT-OSS-20B Function Calling GGUF},
  author={cuijian0819},
  year={2025},
  url={https://huggingface.co/cuijian0819/gpt-oss-20b-function-calling-gguf}
}
```