README.md · AIGym/oss-adapter at main

metadata

base_model: openai/gpt-oss-20b
datasets: AIGym/free-gpt-oss
library_name: transformers
model_name: oss-multi-lingual
tags:
  - generated_from_trainer
  - sft
  - trl
licence: license

Model Card: `AIGym/oss-adapter`

Model Overview

Base model: Fine-tuned from openai/gpt-oss-20b using supervised fine-tuning (SFT) on the AIGym/free-gpt-oss dataset ([Hugging Face][1]).
Motivation: Created to participate in the OpenAI GPT-OSS-20B Red-Teaming Challenge on Kaggle, which tasked participants with probing and uncovering previously undetected harmful behaviors and vulnerabilities in the open-weight GPT-OSS-20B model ([Kaggle][2]).

Intended Use & Scope

Applications: Designed primarily for red-teaming or safety evaluation tasks—leveraging its fine-tuning to explore and detect model vulnerabilities. It can also serve as a foundation in research or development of safer LLM applications.
Limitations: Not recommended for deployment in unmoderated settings or as a general-purpose chatbot. Outputs may include unsafe or adversarial behaviors due to its focus on red-teaming scenarios.

Training Details

Fine-tuning method: Supervised fine-tuning (SFT) using the TRL library ([Hugging Face][1]).
Tooling and versions:
- TRL: 0.21.0
- Transformers: 4.55.2
- PyTorch: 2.8.0.dev20250319+cu128
- Datasets: 4.0.0
- Tokenizers: 0.21.4 ([Hugging Face][1]).
Dataset: AIGym/free-gpt-oss, which presumably includes examples crafted to expose harmful behaviors in the base GPT-OSS-20B model (specific content should be described here if available).

Evaluation & Behavior

Challenge context: The Kaggle Red-Teaming Challenge emphasized discovering hidden vulnerabilities in GPT-OSS-20B by adversarial prompting and probing ([Kaggle][2]).
Performance: (Include any metrics, success rates, or qualitative findings if you evaluated the model’s adversarial robustness compared to the base model.)

Example Usage

from transformers import pipeline

generator = pipeline(
    "text-generation",
    model="AIGym/oss-multi-lingual",  # Or "AIGym/oss-adapter" depending on naming
    device="cuda"
)
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
output = generator(
    [{"role": "user", "content": question}],
    max_new_tokens=128,
    return_full_text=False
)[0]
print(output["generated_text"])

This snippet demonstrates how to query the model in an interactive pipeline, useful for both red-teaming experiments and exploratory analysis ([Hugging Face][1]).

Caveats & Ethical Considerations

Potential risks: The model is intentionally fine-tuned to surface vulnerabilities—it may generate harmful or unsafe content more readily than standard models.
Recommended usage environment: Restricted to controlled research and evaluation settings with proper moderation and oversight. Not intended for downstream production without robust safety measures.
Transparency & reproducibility: Encourage users to report findings responsibly and contribute to community understanding around safe LLM deployment.

Summary Table

Section	Highlights
Overview	Fine-tuned GPT-OSS-20B adapter for red-teaming, using AIGym dataset
Motivation	Built for the Kaggle Red-Teaming Challenge targeting safety analysis
Tools & Versions	TRL 0.21.0, Transformers 4.55.2, PyTorch dev build, Datasets 4.0.0 etc.
Usage Example	Provided pipeline snippet for quick start
Caveats	Generates potentially harmful outputs; meant only for controlled eval
Citation	TRL GitHub repository

Model Card: AIGym/oss-adapter

Model Overview

Intended Use & Scope

Training Details

Evaluation & Behavior

Example Usage

Caveats & Ethical Considerations

Summary Table

Model Card: `AIGym/oss-adapter`