The GPU memory is not released
After a large number of requests, GPU memory is not released , finally OOM
Having the same problem here, making it not usable.
Hi,
Thank you for quick reply! Could you try to run this script by adding this model? https://gist.github.com/joe32140/3c38f377750202d7803b8c0fa0ef1e8b#file-evaluate_code_tasks-py-L196-L199
CodeRankEmbed always consume much more vram compared to other models with similar size, which makes it much slower. I believe there's gpu management issue in the modeling code. (I have adjusted batch size from 32 to 4, but it didn't help.)
If i had to make a quick guess, it's due to the max sequence length being longer than the models you pasted. you can manually override this to be shorter if needed