metadata
library_name: transformers
pipeline_tag: text-generation
license: cc-by-nc-4.0
This repository contains the Guru-32B (base Qwen2.5-32B) model presented in Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective.
The score we evaluate with temperature=1.0, top_p=0.7.
Please refer to the paper for more details.