looking forward to testing this
i wonder if gpt-oss uncensored will be worth it. ill certainly give it a try and update on my findings
I'll be looking at it too very soon.
Hey, can you change the architecture to gptoss isntead of gpt-oss please? ollama does not support gpt-oss only gptoss...
everyone does wrong one :(
Hey, can you change the architecture to gptoss instead of gpt-oss please?
Does that mean it's a typo in the architecture ID?
@cutycat2000x
"arch" is auto-set by Hugging face ;
@Utochi
First quants are up ; Q5_1 is a wee bit better than IQ4_NL.
Other quants in testing.
Make sure you see notes for operation, and settings.
Hi DavidAU,
Thanks for nice work.
Can you try fix the Qwen3-30B-A3B-Instruct-2507 abliterated, it work really well without abliterated. after abliterated it feel not alived. maybe your method can fixit .
@calvin2021y
I will give it a go ... ;
Hi @dav
I has simple test all 3 gpt-oss your posted, I feed as writer they are all not able to match unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF
unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF
can follow very complex instruction, with very natural and vivid word, even writing style. Qwen3-The-Xiaolong-Josiefied
also very good but compare to 2507 is less natural.
gpt-oss
will end story premature, they can follow instruction but not dig into it.
Qwen3-The-Xiaolong-Josiefied
can dig into instruction but the words & details is not good.
Qwen3-4B-Instruct-2507-NEO-Imatrix-PlayGround-GGUF
is like child play, it can not handle complexity.
unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF
is much much more good as a writer, maybe you can add more magic into it?
Qwen3-30B-A3B-Instruct-2507 / Ablit version added ; working on this shortly.
Qwen3-4B-Instruct-2507-NEO-Imatrix-PlayGround-GGUF
is like child play, it can not handle complexity.
My own experience in general (regarding ALL LLMs) is under 8B they may be fast and competent at certain things, but not creative, tends to reuse certain words/phrases a lot more, and complex concepts like doing multiple characters or 'roleplay within a roleplay' just is beyond them.
8B-20B - Milage may vary, very good at probably summarizing, limited analysis, rewriting a pharagraph or two, maybe short stories or generating characters and details.
24B - are decent, can handle multiple characters (2 or 3) but not 'roleplay within a roleplay'
Keep in mind under 30B, if you want NSFW/sex RP, they tend to rush the scene so expect to get to the end of very short scenes in like 2 posts.
50B+ - These are where they tend to do a really good job, other than certain phrases getting repeated it's hard to tell it wasn't written by an LLM.
70B+ - Where i find the best roleplayers are.
I may concentrate more on roleplay, but the 50B+ handles natural language really well at least for my english, where i usually don't have to reiterate points that much.
@yano2mch Hmm... 8B is not repetitive, it depends on the merge, quants and heavily on sampler parameters.
For example I use Q5_KM and used a character card with more than 10K tokens, and after heavy experimentation I managed to get near-perfect results without repeating phrases and good emotional connections.
I might show the results in the SpinFire's discussion; currently finalizing the tests and will post updated settings.
For example I use Q5_KM and used a character card with more than 10K tokens, and after heavy experimentation I managed to get near-perfect results without repeating phrases and good emotional connections.
20B and under i tended to use Q8_0, or Q6_K with i1 weighted.
Curious. I see a lot of certain words generally have a higher usage, 'ghosting his/her ear', 'eyes widen' and 'and don't think for one moment' being some of the ones that repeat more than others.
Not sure but some models (i think only 3 sub 20B i think) after 10k context size started dropped words and the quality of the output really lowered; These aren't important words you can rebuild what's missing without issue, but usually qualifiers: he/she the a, etc.
@yano2mch You need to stabilize the model for better outputs over longer and longer context; All necessary instructions are written in the SuperNova and SpinFire thread. As of now, only the Top_A needs to be changed for specific model (if you follow my settings and instructions)
Qwen3-4B-Instruct-2507-NEO-Imatrix-PlayGround-GGUF
is like child play, it can not handle complexity.My own experience in general (regarding ALL LLMs) is under 8B they may be fast and competent at certain things, but not creative, tends to reuse certain words/phrases a lot more, and complex concepts like doing multiple characters or 'roleplay within a roleplay' just is beyond them.
I just try your potential-models collection in 2b, the 70b Anubus some time can not follow instructions like: here is what already happen in previews scene, you need write the rest by tips: .....
And some time they dont know what has to be done.
I think they has more Knowledge but less understanding ability to follow instructions like new model released recently. I try 2b intel/Qwen3-30B-A3B-Instruct-2507
, and it dont' have this kind problem(they try end soon like your said, but compare to all others I try it made a full story with decent details, unsloth UD version with 6bit is the best I can try on my device ).
But I agree your point, the 70b is much good in details.
But I agree your point, the 70b is much good in details.
I take it english isn't your first language. :P
There are plenty of small models that are creative - Dark Planet Series (8b), Grand Horror (16B), Darkest Universe, and others.
But the real powerhouse (small size) is this one (and its bros and sis'):
https://huggingface.co/DavidAU/Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF
Update:
Source and quants up at:
https://huggingface.co/DavidAU/Qwen3-42B-A3B-2507-Thinking-Abliterated-uncensored-TOTAL-RECALL-v2-Medium-MASTER-CODER