
NOVER1
NOVER1 is a series of large reasoning models that can perform general reasoning across many text-to-text tasks.
NOVER1 is trained using NOVER (NO-VERifier Reinforcement Learning) proposed in the paper NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning.
It is trained on several general reasoning datasets with freeform text answers, eliminating the requirement of a rule-based verifier or reward model by introducing a reasoning perplexity-based proxy reward.
Detail
- Base Model: Qwen/Qwen3-4B-Instruct-2507
- Training Method: NOVER with LoRA finetuning
- Dataset: Modified NOVEReason_5k_reasoning dataset with custom tags
- Finetuning Detail: NOVER1-Qwen3-4B config
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("thinkwee/NOVER1-Qwen3-4B")
tokenizer = AutoTokenizer.from_pretrained("thinkwee/NOVER1-Qwen3-4B")
question = "What is machine learning?"
messages = [
{
"role": "user",
"content": f"Question: {question}\n\nAnswer the question and return in the following format:\n\n<reasoning>\n...\n</reasoning>\n\n<answer>\n...\n</answer>"
}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=1024, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
assistant_start = response.find("assistant\n")
assistant_response = response[assistant_start + len("assistant\n"):]
print(assistant_response)
<reasoning> Machine learning is a subset of artificial intelligence (AI) that focuses on the development of algorithms and statistical models that enable computers to perform tasks without explicit programming. Instead of being given specific instructions for every task, machine learning systems learn patterns and relationships from data. Through training on large datasets, these models can make predictions or decisions, improving their performance over time. Common types include supervised learning (where the model learns from labeled data), unsupervised learning (where patterns are found in unlabeled data), and reinforcement learning (where the model learns by receiving rewards or penalties). Applications span areas such as image recognition, natural language processing, recommendation systems, and autonomous vehicles. </reasoning><answer> Machine learning is a field of artificial intelligence that enables computers to learn from data and improve their performance on tasks without being explicitly programmed for each specific case. </answer>
Citation
If you use this model, please cite the NOVER paper:
@article{liu2025noverincentivetraininglanguage,
title={NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning},
author={Wei Liu and Siya Qi and Xinyu Wang and Chen Qian and Yali Du and Yulan He},
year={2025},
eprint={2505.16022},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.16022},
}
- Downloads last month
- 39