Compatibility with Turing GPU's
Hello I have a Turing GPU (Nvidia GTX 1650 Ti) which is SM_75. Are there any other options than flash attention, so I can still run the model without having to buy a newer GPU? I tried to use flash attention 1.0.9 but no luck as it is outdated.
You can even run the inference without installing any flash-attention package by setting attn_implementation="eager"
or "sdpa"
. And, of course, the inference speed will be impacted.
You can even run the inference without installing any flash-attention package by setting
attn_implementation="eager"
or"sdpa"
. And, of course, the inference speed will be impacted.
Not work. In modeling_ovis_2_5.py, there is
'''
attn_output = flash_attn_varlen_func(queries, keys, values, cu_seqlens, cu_seqlens, max_seqlen, max_seqlen).reshape(
seq_length, -1
)
'''
and no more attention implementation is offered.
Hi, we’ve added SDPA support in the code, so it can now run without flash-attention.