Please considering training a smaller dense variant in the same style.

#46
by drmcbride - opened

With your training mix even just having a 12b model (could be able to fit in unsloths finetuning colab free tier) and a 120-200b model that are dense would be very good for at home users and allow fine tuning to actually take place for a reasonable upfront cost. Moe is amazing for the speed and inference cost reduction but it is limiting a lot of at home fine tunes and actual discovery we saw take place early on in llms because of this need for ram and vram.

Sign up or log in to comment