Why do we need to hardcode self._attn_implementation = "eager"

#35

by shantanuagarwal - opened Jun 15, 2024

Discussion

shantanuagarwal

Jun 15, 2024

•

edited Jun 15, 2024

Thanks a lot for making the code public. Looking into modeling_nvembed.py file, I notice two things:

layer.self_attn.is_causal = False. This makes sense as we want to enforce bi-directionality.
However, what I am not understanding is, why do we need to enforce that the attention implementation be eager? So, sdpa/flash_attention_2 is not supported?

shantanuagarwal

Jun 15, 2024

What I am trying to understand is, what would need to change in BidirectionalMistralModel's forward to make it compatible with sdpa/flash_attention_2?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment