|
--- |
|
license: apache-2.0 |
|
base_model: openai/gpt-oss-20b |
|
tags: |
|
- multilingual |
|
- reasoning |
|
- thinking |
|
- fine-tuned |
|
- lora |
|
- conversational |
|
language: |
|
- multilingual |
|
- en |
|
- es |
|
- ar |
|
- fr |
|
- de |
|
- zh |
|
- ja |
|
- ko |
|
- hi |
|
- ru |
|
datasets: |
|
- HuggingFaceH4/Multilingual-Thinking |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# GPT-OSS-NEMO-20B: Multilingual Thinking Model |
|
|
|
## Model Description |
|
|
|
**GPT-OSS-NEMO-20B** is a fine-tuned version of OpenAI's GPT-OSS-20B model, specifically enhanced for multilingual reasoning and thinking capabilities. This model has been trained using Supervised Fine-Tuning (SFT) on the HuggingFaceH4/Multilingual-Thinking dataset to improve its ability to reason in multiple languages while maintaining strong performance across diverse linguistic contexts. |
|
|
|
## Key Features |
|
|
|
- 🌍 **Multilingual Reasoning**: Enhanced ability to think and reason in multiple languages |
|
- 🧠 **Chain-of-Thought**: Improved reasoning capabilities with explicit thinking processes |
|
- 💬 **Conversational**: Optimized for interactive dialogue and question-answering |
|
- 🎯 **Cross-lingual**: Can reason in one language and respond in another |
|
- ⚡ **High Performance**: Built on the robust 20B parameter GPT-OSS foundation |
|
|
|
## Training Details |
|
|
|
### Base Model |
|
- **Model**: [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) |
|
- **Parameters**: 20 billion parameters |
|
- **Architecture**: GPT-OSS (Mixture of Experts) |
|
|
|
### Fine-tuning Configuration |
|
- **Method**: LoRA (Low-Rank Adaptation) |
|
- **Rank (r)**: 8 |
|
- **Alpha**: 16 |
|
- **Target Modules**: All linear layers with specific focus on MoE expert layers |
|
- **Target Parameters**: |
|
- Layer 7, 15, 23 MLP experts (gate_up_proj, down_proj) |
|
|
|
### Training Infrastructure |
|
- **Hardware**: 4x NVIDIA H100 GPUs |
|
- **Cloud Platform**: Microsoft Azure NC-series instances |
|
- **Training Framework**: TRL (Transformers Reinforcement Learning) |
|
- **Optimization**: AdamW with cosine learning rate scheduling |
|
|
|
### Training Hyperparameters |
|
- **Learning Rate**: 2e-4 |
|
- **Batch Size**: 4 per device (16 total with 4 GPUs) |
|
- **Gradient Accumulation**: 4 steps |
|
- **Epochs**: 4 |
|
- **Max Sequence Length**: 2048 tokens |
|
- **Warmup Ratio**: 3% |
|
- **LR Scheduler**: Cosine with minimum LR (10% of peak) |
|
- **Gradient Checkpointing**: Enabled |
|
|
|
### Dataset |
|
- **Name**: [HuggingFaceH4/Multilingual-Thinking](https://huggingface.co/datasets/HuggingFaceH4/Multilingual-Thinking) |
|
- **Purpose**: Multilingual reasoning and thinking enhancement |
|
- **Languages**: Multiple languages including English, Spanish, Arabic, French, German, Chinese, Japanese, Korean, Hindi, Russian |
|
- **Training Split**: Full training set |
|
|
|
## Usage |
|
|
|
### Quick Start |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
# Load model and tokenizer |
|
model = AutoModelForCausalLM.from_pretrained( |
|
"justinj92/gpt-oss-nemo-20b", |
|
torch_dtype="auto", |
|
device_map="auto" |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained("justinj92/gpt-oss-nemo-20b") |
|
|
|
# Example: Multilingual reasoning |
|
messages = [ |
|
{"role": "system", "content": "reasoning language: Arabic"}, |
|
{"role": "user", "content": "¿Cuál es la capital de Australia?"} |
|
] |
|
|
|
inputs = tokenizer.apply_chat_template( |
|
messages, |
|
add_generation_prompt=True, |
|
return_tensors="pt" |
|
) |
|
|
|
outputs = model.generate( |
|
inputs, |
|
max_new_tokens=512, |
|
temperature=0.6, |
|
do_sample=True |
|
) |
|
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(response) |
|
``` |
|
|
|
### Advanced Usage with Custom Reasoning Language |
|
|
|
```python |
|
# Specify reasoning language in system prompt |
|
reasoning_language = "French" # Can be any supported language |
|
system_prompt = f"reasoning language: {reasoning_language}" |
|
|
|
messages = [ |
|
{"role": "system", "content": system_prompt}, |
|
{"role": "user", "content": "Explain quantum computing in simple terms."} |
|
] |
|
``` |
|
|
|
## Model Capabilities |
|
|
|
### Multilingual Reasoning |
|
The model can: |
|
- Think and reason in a specified language (via system prompt) |
|
- Process questions in one language and reason in another |
|
- Maintain coherent logic across language boundaries |
|
- Provide explanations with explicit reasoning steps |
|
|
|
### Language Support |
|
Primary languages include: |
|
- **English** (en) |
|
- **Spanish** (es) |
|
- **Arabic** (ar) |
|
- **French** (fr) |
|
- **German** (de) |
|
- **Chinese** (zh) |
|
- **Japanese** (ja) |
|
- **Korean** (ko) |
|
- **Hindi** (hi) |
|
- **Russian** (ru) |
|
|
|
## Performance |
|
|
|
The model demonstrates improved performance in: |
|
- Cross-lingual reasoning tasks |
|
- Multi-step problem solving |
|
- Contextual understanding across languages |
|
- Maintaining coherence in multilingual conversations |
|
|
|
## Limitations |
|
|
|
- Performance may vary across different languages |
|
- Complex reasoning in low-resource languages may be limited |
|
- Generated content should be verified for factual accuracy |
|
- May exhibit biases present in the training data |
|
|
|
## Technical Specifications |
|
|
|
- **Model Size**: ~20B parameters |
|
- **Precision**: BF16 (Brain Floating Point 16-bit) |
|
- **Memory Requirements**: ~40GB VRAM for inference |
|
- **Recommended Hardware**: NVIDIA A100/H100 or similar high-memory GPUs |
|
- **Framework Compatibility**: transformers, torch, accelerate |
|
|
|
## Citation |
|
|
|
If you use this model in your research, please cite: |
|
|
|
```bibtex |
|
@misc{gpt-oss-nemo-20b, |
|
title={GPT-OSS-NEMO-20B: A Multilingual Thinking Model}, |
|
author={justinj92}, |
|
year={2025}, |
|
howpublished={\url{https://huggingface.co/justinj92/gpt-oss-nemo-20b}}, |
|
note={Fine-tuned from openai/gpt-oss-20b using HuggingFaceH4/Multilingual-Thinking} |
|
} |
|
``` |
|
|
|
## Acknowledgments |
|
|
|
- **Base Model**: OpenAI GPT-OSS-20B team |
|
- **Dataset**: HuggingFace H4 team for the Multilingual-Thinking dataset |
|
- **Infrastructure**: Microsoft Azure for cloud computing resources |
|
- **Framework**: Hugging Face transformers and TRL libraries |
|
|
|
## License |
|
|
|
This model is released under the Apache 2.0 license, following the base model's licensing terms. |
|
|
|
--- |
|
|
|
*Model trained on August 2025 using state-of-the-art multilingual reasoning techniques.* |
|
|