gpt-oss-nemo-20b / README.md

Upload README.md with huggingface_hub

efff5bd verified 2 months ago

5.97 kB

	---
	license: apache-2.0
	base_model: openai/gpt-oss-20b
	tags:
	- multilingual
	- reasoning
	- thinking
	- fine-tuned
	- lora
	- conversational
	language:
	- multilingual
	- en
	- es
	- ar
	- fr
	- de
	- zh
	- ja
	- ko
	- hi
	- ru
	datasets:
	- HuggingFaceH4/Multilingual-Thinking
	library_name: transformers
	pipeline_tag: text-generation
	---

	# GPT-OSS-NEMO-20B: Multilingual Thinking Model

	## Model Description

	GPT-OSS-NEMO-20B is a fine-tuned version of OpenAI's GPT-OSS-20B model, specifically enhanced for multilingual reasoning and thinking capabilities. This model has been trained using Supervised Fine-Tuning (SFT) on the HuggingFaceH4/Multilingual-Thinking dataset to improve its ability to reason in multiple languages while maintaining strong performance across diverse linguistic contexts.

	## Key Features

	- 🌍 Multilingual Reasoning: Enhanced ability to think and reason in multiple languages
	- 🧠 Chain-of-Thought: Improved reasoning capabilities with explicit thinking processes
	- 💬 Conversational: Optimized for interactive dialogue and question-answering
	- 🎯 Cross-lingual: Can reason in one language and respond in another
	- ⚡ High Performance: Built on the robust 20B parameter GPT-OSS foundation

	## Training Details

	### Base Model
	- Model: [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b)
	- Parameters: 20 billion parameters
	- Architecture: GPT-OSS (Mixture of Experts)

	### Fine-tuning Configuration
	- Method: LoRA (Low-Rank Adaptation)
	- Rank (r): 8
	- Alpha: 16
	- Target Modules: All linear layers with specific focus on MoE expert layers
	- Target Parameters:
	- Layer 7, 15, 23 MLP experts (gate_up_proj, down_proj)

	### Training Infrastructure
	- Hardware: 4x NVIDIA H100 GPUs
	- Cloud Platform: Microsoft Azure NC-series instances
	- Training Framework: TRL (Transformers Reinforcement Learning)
	- Optimization: AdamW with cosine learning rate scheduling

	### Training Hyperparameters
	- Learning Rate: 2e-4
	- Batch Size: 4 per device (16 total with 4 GPUs)
	- Gradient Accumulation: 4 steps
	- Epochs: 4
	- Max Sequence Length: 2048 tokens
	- Warmup Ratio: 3%
	- LR Scheduler: Cosine with minimum LR (10% of peak)
	- Gradient Checkpointing: Enabled

	### Dataset
	- Name: [HuggingFaceH4/Multilingual-Thinking](https://huggingface.co/datasets/HuggingFaceH4/Multilingual-Thinking)
	- Purpose: Multilingual reasoning and thinking enhancement
	- Languages: Multiple languages including English, Spanish, Arabic, French, German, Chinese, Japanese, Korean, Hindi, Russian
	- Training Split: Full training set

	## Usage

	### Quick Start

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	# Load model and tokenizer
	model = AutoModelForCausalLM.from_pretrained(
	"justinj92/gpt-oss-nemo-20b",
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained("justinj92/gpt-oss-nemo-20b")

	# Example: Multilingual reasoning
	messages = [
	{"role": "system", "content": "reasoning language: Arabic"},
	{"role": "user", "content": "¿Cuál es la capital de Australia?"}
	]

	inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	return_tensors="pt"
	)

	outputs = model.generate(
	inputs,
	max_new_tokens=512,
	temperature=0.6,
	do_sample=True
	)

	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```

	### Advanced Usage with Custom Reasoning Language

	```python
	# Specify reasoning language in system prompt
	reasoning_language = "French" # Can be any supported language
	system_prompt = f"reasoning language: {reasoning_language}"

	messages = [
	{"role": "system", "content": system_prompt},
	{"role": "user", "content": "Explain quantum computing in simple terms."}
	]
	```

	## Model Capabilities

	### Multilingual Reasoning
	The model can:
	- Think and reason in a specified language (via system prompt)
	- Process questions in one language and reason in another
	- Maintain coherent logic across language boundaries
	- Provide explanations with explicit reasoning steps

	### Language Support
	Primary languages include:
	- English (en)
	- Spanish (es)
	- Arabic (ar)
	- French (fr)
	- German (de)
	- Chinese (zh)
	- Japanese (ja)
	- Korean (ko)
	- Hindi (hi)
	- Russian (ru)

	## Performance

	The model demonstrates improved performance in:
	- Cross-lingual reasoning tasks
	- Multi-step problem solving
	- Contextual understanding across languages
	- Maintaining coherence in multilingual conversations

	## Limitations

	- Performance may vary across different languages
	- Complex reasoning in low-resource languages may be limited
	- Generated content should be verified for factual accuracy
	- May exhibit biases present in the training data

	## Technical Specifications

	- Model Size: ~20B parameters
	- Precision: BF16 (Brain Floating Point 16-bit)
	- Memory Requirements: ~40GB VRAM for inference
	- Recommended Hardware: NVIDIA A100/H100 or similar high-memory GPUs
	- Framework Compatibility: transformers, torch, accelerate

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@misc{gpt-oss-nemo-20b,
	title={GPT-OSS-NEMO-20B: A Multilingual Thinking Model},
	author={justinj92},
	year={2025},
	howpublished={\url{https://huggingface.co/justinj92/gpt-oss-nemo-20b}},
	note={Fine-tuned from openai/gpt-oss-20b using HuggingFaceH4/Multilingual-Thinking}
	}
	```

	## Acknowledgments

	- Base Model: OpenAI GPT-OSS-20B team
	- Dataset: HuggingFace H4 team for the Multilingual-Thinking dataset
	- Infrastructure: Microsoft Azure for cloud computing resources
	- Framework: Hugging Face transformers and TRL libraries

	## License

	This model is released under the Apache 2.0 license, following the base model's licensing terms.

	---

	Model trained on August 2025 using state-of-the-art multilingual reasoning techniques.