Flash Attention Req When Rrunning Model

#13
by Chillarmo - opened

Hey, I recently found this model and think it is really cool.

However, it seems that I need Flash Attention to run it.

Could you guys make a version that does not require Flash Attention (Maybe using SDPA)?

Or at least supply some way of running the model without Flash Attention installed?

rednote-hilab org

Hey, I recently found this model and think it is really cool.

However, it seems that I need Flash Attention to run it.

Could you guys make a version that does not require Flash Attention (Maybe using SDPA)?

Or at least supply some way of running the model without Flash Attention installed?

Refer to this: https://github.com/rednote-hilab/dots.ocr/issues/1#issuecomment-3148962536
You can run with CPU or GPU by sdpa with HF inference. (vLLM maybe OOM)

Sign up or log in to comment