Please considering training a smaller dense variant in the same style.

#46

by drmcbride - opened 18 days ago

18 days ago

With your training mix even just having a 12b model (could be able to fit in unsloths finetuning colab free tier) and a 120-200b model that are dense would be very good for at home users and allow fine tuning to actually take place for a reasonable upfront cost. Moe is amazing for the speed and inference cost reduction but it is limiting a lot of at home fine tunes and actual discovery we saw take place early on in llms because of this need for ram and vram.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment