Flash Attention Req When Rrunning Model
#13
by
Chillarmo
- opened
Hey, I recently found this model and think it is really cool.
However, it seems that I need Flash Attention to run it.
Could you guys make a version that does not require Flash Attention (Maybe using SDPA)?
Or at least supply some way of running the model without Flash Attention installed?
Hey, I recently found this model and think it is really cool.
However, it seems that I need Flash Attention to run it.
Could you guys make a version that does not require Flash Attention (Maybe using SDPA)?
Or at least supply some way of running the model without Flash Attention installed?
Refer to this: https://github.com/rednote-hilab/dots.ocr/issues/1#issuecomment-3148962536
You can run with CPU or GPU by sdpa with HF inference. (vLLM maybe OOM)