NOVER1

NOVER1 is a series of large reasoning models that can perform general reasoning across many text-to-text tasks.

NOVER1 is trained using NOVER (NO-VERifier Reinforcement Learning) proposed in the paper NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning.

It is trained on several general reasoning datasets with freeform text answers, eliminating the requirement of a rule-based verifier or reward model by introducing a reasoning perplexity-based proxy reward.

Detail

Base Model: Qwen/Qwen3-4B-Instruct-2507
Training Method: NOVER with LoRA finetuning
Dataset: Modified NOVEReason_5k_reasoning dataset with custom tags
Finetuning Detail: NOVER1-Qwen3-4B config

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("thinkwee/NOVER1-Qwen3-4B")
tokenizer = AutoTokenizer.from_pretrained("thinkwee/NOVER1-Qwen3-4B")

question = "What is machine learning?"

messages = [
    {
    "role": "user", 
    "content": f"Question: {question}\n\nAnswer the question and return in the following format:\n\n<reasoning>\n...\n</reasoning>\n\n<answer>\n...\n</answer>"
    }
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=1024, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

assistant_start = response.find("assistant\n")
assistant_response = response[assistant_start + len("assistant\n"):]
print(assistant_response)

<reasoning> Machine learning is a subset of artificial intelligence (AI) that focuses on the development of algorithms and statistical models that enable computers to perform tasks without explicit programming. Instead of being given specific instructions for every task, machine learning systems learn patterns and relationships from data. Through training on large datasets, these models can make predictions or decisions, improving their performance over time. Common types include supervised learning (where the model learns from labeled data), unsupervised learning (where patterns are found in unlabeled data), and reinforcement learning (where the model learns by receiving rewards or penalties). Applications span areas such as image recognition, natural language processing, recommendation systems, and autonomous vehicles. </reasoning>

<answer> Machine learning is a field of artificial intelligence that enables computers to learn from data and improve their performance on tasks without being explicitly programmed for each specific case. </answer>

Citation

If you use this model, please cite the NOVER paper:

@article{liu2025noverincentivetraininglanguage,
      title={NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning}, 
      author={Wei Liu and Siya Qi and Xinyu Wang and Chen Qian and Yali Du and Yulan He},
      year={2025},
      eprint={2505.16022},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.16022}, 
}

thinkwee
/

NOVER1-Qwen3-4B

NOVER1

Detail

Usage

Citation

Model tree for thinkwee/NOVER1-Qwen3-4B

Dataset used to train thinkwee/NOVER1-Qwen3-4B

Collection including thinkwee/NOVER1-Qwen3-4B

NOVER1