OOM on 3090
#52
by
TheBigBlockPC
- opened
I tried running this LLM on my dual 3090 PC but it runs on of memory in a single or even both 3090s. Quantizing uding botsandbytes doesn't work. I use transformers. Can someone help fixing that
This is the 120B model, which is over 60GB, it won't fit in 2 3090s, I suspect you're better off using the smaller model rather than lower quants of this one
Wrong model, i meant to comment on the 20b one
fixed it using ChatGPT.
for the kernels you need to do:
pip install git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels
you need to inmstall transformers using this command:
pip install git+https://github.com/huggingface/transformers.git
update triton as the last step bnecause pip sometimes just downgrades it
TheBigBlockPC
changed discussion status to
closed