SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities Paper • 2502.12025 • Published Feb 17 • 3
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published 17 days ago • 157
dphn/Dolphin-Mistral-24B-Venice-Edition Text Generation • 24B • Updated about 1 month ago • 3.75k • • 146
gpt-oss Collection Open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. • 2 items • Updated 18 days ago • 316