Compatibility with Turing GPU's

by Ddopez - opened 3 days ago

3 days ago

Hello I have a Turing GPU (Nvidia GTX 1650 Ti) which is SM_75. Are there any other options than flash attention, so I can still run the model without having to buy a newer GPU? I tried to use flash attention 1.0.9 but no luck as it is outdated.

Geralt-Lee

3 days ago

•

edited 3 days ago

You can even run the inference without installing any flash-attention package by setting attn_implementation="eager" or "sdpa". And, of course, the inference speed will be impacted.

Edmunddwu

2 days ago

You can even run the inference without installing any flash-attention package by setting attn_implementation="eager" or "sdpa". And, of course, the inference speed will be impacted.

Not work. In modeling_ovis_2_5.py, there is
'''
attn_output = flash_attn_varlen_func(queries, keys, values, cu_seqlens, cu_seqlens, max_seqlen, max_seqlen).reshape(
seq_length, -1
)
'''
and no more attention implementation is offered.

xxyyy123

AIDC-AI org 2 days ago

Hi, we’ve added SDPA support in the code, so it can now run without flash-attention.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment