NOVER1

NOVER1 is a series of large reasoning models that can perform general reasoning across many text-to-text tasks.

NOVER1 is trained using NOVER (NO-VERifier Reinforcement Learning) proposed in the paper NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning.

It is trained on several general reasoning datasets with freeform text answers, eliminating the requirement of a rule-based verifier or reward model by introducing a reasoning perplexity-based proxy reward.

Detail

Base Model: Qwen/Qwen2.5-7B
Training Method: NOVER with LoRA finetuning
Dataset: NOVEReason_5k_reasoning
Finetuning Detail: NOVER1-Qwen2.5-7B config

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("thinkwee/NOVER1-Qwen2.5-7B")
tokenizer = AutoTokenizer.from_pretrained("thinkwee/NOVER1-Qwen2.5-7B")

question = "What is machine learning?"

prompt = f"Question: {question}\n\nAnswer the question and return in the following format:\n\n<think>\n...\n</think>\n\n<answer>\n...\n</answer>"

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs, 
    max_new_tokens=4096,
    temperature=0.7,
    do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(response[len(prompt):])

<think> In the context of artificial intelligence (AI), machine learning is a technique that allows computers to learn from data and improve their performance on a specific task without being explicitly programmed. Machine learning algorithms can analyze large amounts of data and identify patterns and relationships that can be used to make predictions or decisions. This approach is particularly useful in applications where it is difficult or impossible to program the computer to perform a specific task, such as image recognition, natural language processing, and recommendation systems. </think>

<answer> Machine learning is a technique that involves training a computer model on a large dataset to identify patterns and relationships that can be used to make predictions or decisions. It is based on the idea that computers can learn from data and improve their performance on a specific task without being explicitly programmed. Machine learning algorithms can analyze large amounts of data and identify patterns and relationships that can be used to make predictions or decisions. This approach is particularly useful in applications where it is difficult or impossible to program the computer to perform a specific task, such as image recognition, natural language processing, and recommendation systems. </answer>

Citation

If you use this model, please cite the NOVER paper:

@article{liu2025noverincentivetraininglanguage,
      title={NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning}, 
      author={Wei Liu and Siya Qi and Xinyu Wang and Chen Qian and Yali Du and Yulan He},
      year={2025},
      eprint={2505.16022},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.16022}, 
}

thinkwee
/

NOVER1-Qwen2.5-7B

NOVER1

Detail

Usage

Citation

Model tree for thinkwee/NOVER1-Qwen2.5-7B

Dataset used to train thinkwee/NOVER1-Qwen2.5-7B

Collection including thinkwee/NOVER1-Qwen2.5-7B

NOVER1