NOVER1 Logo

NOVER1

NOVER1 is a series of large reasoning models that can perform general reasoning across many text-to-text tasks.

NOVER1 is trained using NOVER (NO-VERifier Reinforcement Learning) proposed in the paper NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning.

It is trained on several general reasoning datasets with freeform text answers, eliminating the requirement of a rule-based verifier or reward model by introducing a reasoning perplexity-based proxy reward.

Detail

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("thinkwee/NOVER1-Qwen2.5-7B")
tokenizer = AutoTokenizer.from_pretrained("thinkwee/NOVER1-Qwen2.5-7B")

question = "What is machine learning?"

prompt = f"Question: {question}\n\nAnswer the question and return in the following format:\n\n<think>\n...\n</think>\n\n<answer>\n...\n</answer>"

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs, 
    max_new_tokens=4096,
    temperature=0.7,
    do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(response[len(prompt):])
<think>
In the context of artificial intelligence (AI), machine learning is a technique that allows computers to learn from data and improve their performance on a specific task without being explicitly programmed. Machine learning algorithms can analyze large amounts of data and identify patterns and relationships that can be used to make predictions or decisions. This approach is particularly useful in applications where it is difficult or impossible to program the computer to perform a specific task, such as image recognition, natural language processing, and recommendation systems.
</think>

<answer> Machine learning is a technique that involves training a computer model on a large dataset to identify patterns and relationships that can be used to make predictions or decisions. It is based on the idea that computers can learn from data and improve their performance on a specific task without being explicitly programmed. Machine learning algorithms can analyze large amounts of data and identify patterns and relationships that can be used to make predictions or decisions. This approach is particularly useful in applications where it is difficult or impossible to program the computer to perform a specific task, such as image recognition, natural language processing, and recommendation systems. </answer>

Citation

If you use this model, please cite the NOVER paper:

@article{liu2025noverincentivetraininglanguage,
      title={NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning}, 
      author={Wei Liu and Siya Qi and Xinyu Wang and Chen Qian and Yali Du and Yulan He},
      year={2025},
      eprint={2505.16022},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.16022}, 
}
Downloads last month
29
Safetensors
Model size
7.62B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for thinkwee/NOVER1-Qwen2.5-7B

Base model

Qwen/Qwen2.5-7B
Finetuned
(634)
this model
Quantizations
2 models

Dataset used to train thinkwee/NOVER1-Qwen2.5-7B

Collection including thinkwee/NOVER1-Qwen2.5-7B