Besides updates to our 14B, and 70B, we have a new LFM2-based 1.2B, Llama 3.2-based 3B, and Qwen 3-based 8B, all with class-leading Japanese language capabilities.
Per usual, lots of details in the Model Cards for those interested.
I wanted to call attention to Arli Ai's success in applying my recent modifications to refusal ablation to a MoE model successfully. Nice work, @OwenArli ! ArliAI/GLM-4.5-Air-Derestricted Ablation on a MoE model is no small thing; I expect preserving norms/magnitudes during intervention better respects routing compared to naive refusal ablation.
(I would have tagged their org earlier, but that feature seemed to be broken via "@")
Implemented a proof of concept sampler in pure PyTorch and transformers.
Max P consists of a dynamic token filter which applies Winsorization to cap the probabilties of top tokens. Specifically, a base probability in the range of [0,1] is used to cap individual token probability; the sampler then redistributes excess proportionally.