OOM on 3090

#52

by TheBigBlockPC - opened 19 days ago

19 days ago

I tried running this LLM on my dual 3090 PC but it runs on of memory in a single or even both 3090s. Quantizing uding botsandbytes doesn't work. I use transformers. Can someone help fixing that

hexess

19 days ago

This is the 120B model, which is over 60GB, it won't fit in 2 3090s, I suspect you're better off using the smaller model rather than lower quants of this one

TheBigBlockPC

19 days ago

Wrong model, i meant to comment on the 20b one

marcsun13

19 days ago

Wrong model, i meant to comment on the 20b one

Please try with transformers from source, we recently merged this PR !

TheBigBlockPC

19 days ago

•

edited 18 days ago

fixed it using ChatGPT.
for the kernels you need to do:

pip install git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels

you need to inmstall transformers using this command:

pip install git+https://github.com/huggingface/transformers.git

update triton as the last step bnecause pip sometimes just downgrades it

TheBigBlockPC changed discussion status to closed 19 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment