Don't use this - use the newer version instead
Model Card for Qwen3-14B-ZeroGPT-beta-step-150
Model Details
This model was fine tuned with GRPO using an inverted score from trentmkelly/zerogpt_distil as the reward function. This model is still very much undercooked, and I have more experimentation to do with the reward functions, however in its current state it tends to generate essays which consistently score around 20% AI on ZeroGPT's AI text classifier.
Due to suboptimal reward functions defined in the training, the writing style is a little bit strange. If I had to describe it, I'd say it writes like a bright high school student who has a very formulaic understanding of how an essay ought to be formatted.
Future updates will hopefully improve accuracy. Follow me to get notified when I post them :)
System Prompt
The system prompt used during training was /no_think\nYou are an essay writer. Write like a human. You will be graded on how human you sound, so try to avoid sounding like AI. Your essay should be 5 paragraphs long.
Thinking mode hasn't been tested nor have other variations from this prompt. Variations will probably affect how the model performs versus the real classifier.
Framework versions
- PEFT 0.15.2
- Downloads last month
- 5