OOM on 3090

#52
by TheBigBlockPC - opened

I tried running this LLM on my dual 3090 PC but it runs on of memory in a single or even both 3090s. Quantizing uding botsandbytes doesn't work. I use transformers. Can someone help fixing that

This is the 120B model, which is over 60GB, it won't fit in 2 3090s, I suspect you're better off using the smaller model rather than lower quants of this one

Wrong model, i meant to comment on the 20b one

Wrong model, i meant to comment on the 20b one

Please try with transformers from source, we recently merged this PR !

fixed it using ChatGPT.
for the kernels you need to do:

pip install git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels

you need to inmstall transformers using this command:

pip install git+https://github.com/huggingface/transformers.git 

update triton as the last step bnecause pip sometimes just downgrades it

TheBigBlockPC changed discussion status to closed

Sign up or log in to comment