Please considering training a smaller dense variant in the same style.
#46
by
drmcbride
- opened
With your training mix even just having a 12b model (could be able to fit in unsloths finetuning colab free tier) and a 120-200b model that are dense would be very good for at home users and allow fine tuning to actually take place for a reasonable upfront cost. Moe is amazing for the speed and inference cost reduction but it is limiting a lot of at home fine tunes and actual discovery we saw take place early on in llms because of this need for ram and vram.