Works great - Will you release how you did this?

#6
by deleted - opened
deleted

Just tested this, and it definitely does work better than the usual FP8 Dev model, nice job.
I'm not asking this so much for myself, but others that release FP8-based checkpoints: explaining in detail how you go about creating the scaled version is something that hopefully other people will start using as well, because the results are great, and, as you explained previously, it is more precise.

Any chance you could release the script you used to create the scaled version? I tried making my own but the results were subpar.

Edit: Never mind, I figured it out.

Edit: Actually the model I converted isn't behaving as expected and I'm still not sure how all the tensor scaling was handled.

Sign up or log in to comment