gpt-oss-120b-GGUF

Read our guide on using gpt-oss to learn how to adjust its responses

gpt-oss-120b

Highlights

  • Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.
  • Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
  • Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users.
  • Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
  • Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs.
  • Native MXFP4 quantization: The models are trained with native MXFP4 precision for the MoE layer, making gpt-oss-120b run on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and the gpt-oss-20b model run within 16GB of memory.

Refer to the original model card for more details on the model

Quants

Link URI Size
GGUF hf:giladgd/gpt-oss-120b-GGUF/gpt-oss-120b.MXFP4-00001-of-00002.gguf 63.4GB
GGUF hf:giladgd/gpt-oss-120b-GGUF/gpt-oss-120b.F16-00001-of-00002.gguf 65.4GB

Download a quant using node-llama-cpp (more info):

npx -y node-llama-cpp pull <URI>

Usage

Use with node-llama-cpp (recommended)

CLI

Chat with the model:

npx -y node-llama-cpp chat hf:giladgd/gpt-oss-120b-GGUF/gpt-oss-120b.MXFP4-00001-of-00002.gguf

Ensure that you have node.js installed first:

brew install nodejs

Code

Use it in your node.js project:

npm install node-llama-cpp
import {getLlama, resolveModelFile, LlamaChatSession} from "node-llama-cpp";

const modelUri = "hf:giladgd/gpt-oss-120b-GGUF/gpt-oss-120b.MXFP4-00001-of-00002.gguf";


const llama = await getLlama();
const model = await llama.loadModel({
    modelPath: await resolveModelFile(modelUri)
});
const context = await model.createContext();
const session = new LlamaChatSession({
    contextSequence: context.getSequence()
});


const q1 = "Hi there, how are you?";
console.log("User: " + q1);

const a1 = await session.prompt(q1);
console.log("AI: " + a1);

Read the getting started guide to quickly scaffold a new node-llama-cpp project

Customize inference options

Set Harmoy options using HarmonyChatWrapper:

import {
    getLlama, resolveModelFile, LlamaChatSession, HarmonyChatWrapper,
    defineChatSessionFunction
} from "node-llama-cpp";

const modelUri = "hf:giladgd/gpt-oss-120b-GGUF/gpt-oss-120b.MXFP4-00001-of-00002.gguf";


const llama = await getLlama();
const model = await llama.loadModel({
    modelPath: await resolveModelFile(modelUri)
});
const context = await model.createContext();
const session = new LlamaChatSession({
    contextSequence: context.getSequence(),
    chatWrapper: new HarmonyChatWrapper({
        modelIdentity: "You are ChatGPT, a large language model trained by OpenAI.",
        reasoningEffort: "high"
    })
});

const functions = {
    getCurrentWeather: defineChatSessionFunction({
        description: "Gets the current weather in the provided location.",
        params: {
            type: "object",
            properties: {
                location: {
                    type: "string",
                    description: "The city and state, e.g. San Francisco, CA"
                },
                format: {
                    enum: ["celsius", "fahrenheit"]
                }
            }
        },
        handler({location, format}) {
            console.log(`Getting current weather for "${location}" in ${format}`);

            return {
                // simulate a weather API response
                temperature: format === "celsius" ? 20 : 68,
                format
            };
        }
    })
};

const q1 = "What is the weather like in SF?";
console.log("User: " + q1);

const a1 = await session.prompt(q1, {functions});
console.log("AI: " + a1);

Use with llama.cpp

Install llama.cpp through brew (works on Mac and Linux)

brew install llama.cpp

CLI

llama-cli --hf-repo giladgd/gpt-oss-120b-GGUF --hf-file gpt-oss-120b.MXFP4-00001-of-00002.gguf -p "The meaning to life and the universe is"

Server

llama-server --hf-repo giladgd/gpt-oss-120b-GGUF --hf-file gpt-oss-120b.MXFP4-00001-of-00002.gguf -c 2048
Downloads last month
876
GGUF
Model size
117B params
Architecture
gpt-oss
Hardware compatibility
Log In to view the estimation

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for giladgd/gpt-oss-120b-GGUF

Quantized
(34)
this model