JohannesGaessler's picture
CUDA: add FP32 FlashAttention vector kernel (llama/7188)
03d4b22 unverified