Prefix scan is a great intro to GPU programming:

https://developer.download.nvidia.com/compute/cuda/2_2/sdk/w...

After this you should be able to tell whether you enjoy this kind of work.

If you do, try to do a reasonably optimized GEMM, and then try to follow the FlashAttention paper and implement a basic version of what they're doing.