How much is fft used for AI? Seems that attention and convolution could benefit from this.

There are architectures, such as FNO, that utilize FFTs within them. These are particularly popular in deep learning weather prediction problems.