Compatibility with Turing GPU's

#6
by Ddopez - opened

Hello I have a Turing GPU (Nvidia GTX 1650 Ti) which is SM_75. Are there any other options than flash attention, so I can still run the model without having to buy a newer GPU? I tried to use flash attention 1.0.9 but no luck as it is outdated.

You can even run the inference without installing any flash-attention package by setting attn_implementation="eager" or "sdpa". And, of course, the inference speed will be impacted.

You can even run the inference without installing any flash-attention package by setting attn_implementation="eager" or "sdpa". And, of course, the inference speed will be impacted.

Not work. In modeling_ovis_2_5.py, there is
'''
attn_output = flash_attn_varlen_func(queries, keys, values, cu_seqlens, cu_seqlens, max_seqlen, max_seqlen).reshape(
seq_length, -1
)
'''
and no more attention implementation is offered.

AIDC-AI org

Hi, we’ve added SDPA support in the code, so it can now run without flash-attention.

Sign up or log in to comment