File size: 2,480 Bytes
d0805f9 a71a80e d0805f9 5018050 d0805f9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
---
tags:
- gguf
- quantized
- gpt-oss
- multilingual
- text-generation
- llama-cpp
- ollama
language:
- en
- es
- fr
- de
- it
- pt
license: apache-2.0
model_type: gpt-oss
pipeline_tag: text-generation
base_model: openai/gpt-oss-20b
---
# GPT-OSS-20B Function Calling GGUF
This repository contains the GPT-OSS-20B model fine-tuned on function calling data, converted to GGUF format for efficient inference with llama.cpp and Ollama.
## Model Details
- **Base Model:** openai/gpt-oss-20b
- **Fine-tuning Dataset:** Salesforce/xlam-function-calling-60k (2000 samples)
- **Fine-tuning Method:** LoRA (r=8, alpha=16)
- **Context Length:** 131,072 tokens
- **Model Size:** 20B parameters
## Files
- `gpt-oss-20b-function-calling-f16.gguf`: F16 precision model (best quality)
- `gpt-oss-20b-function-calling.Q4_K_M.gguf`: Q4_K_M quantized model (recommended for inference)
## Usage
### With Ollama (Recommended)
```bash
# Direct from Hugging Face
ollama run hf.co/cuijian0819/gpt-oss-20b-function-calling-gguf:Q4_K_M
# Or create local model
ollama create my-gpt-oss -f Modelfile
ollama run my-gpt-oss
```
### With llama.cpp
```bash
# Download model
wget https://huggingface.co/cuijian0819/gpt-oss-20b-function-calling-gguf/resolve/main/gpt-oss-20b-function-calling.Q4_K_M.gguf
# Run inference
./llama-cli -m gpt-oss-20b-function-calling.Q4_K_M.gguf -p "Your prompt here"
```
### Example Modelfile for Ollama
```dockerfile
FROM ./gpt-oss-20b-function-calling.Q4_K_M.gguf
TEMPLATE """<|start|>user<|message|>{{ .Prompt }}<|end|>
<|start|>assistant<|channel|>final<|message|>"""
PARAMETER temperature 0.7
PARAMETER top_p 0.9
SYSTEM """You are a helpful AI assistant that can call functions to help users."""
```
## PyTorch Version
For training and fine-tuning with PyTorch/Transformers, check out the PyTorch version: [cuijian0819/gpt-oss-20b-function-calling](https://huggingface.co/cuijian0819/gpt-oss-20b-function-calling)
## Performance
The Q4_K_M quantized version provides excellent performance:
- **Size Reduction:** ~62% smaller than F16
- **Memory Requirements:** ~16GB VRAM recommended
- **Quality:** Minimal degradation from quantization
## License
This model inherits the license from the base openai/gpt-oss-20b model.
## Citation
```bibtex
@misc{gpt-oss-20b-function-calling-gguf,
title={GPT-OSS-20B Function Calling GGUF},
author={cuijian0819},
year={2025},
url={https://huggingface.co/cuijian0819/gpt-oss-20b-function-calling-gguf}
}
```
|