This looks similar to https://github.com/chengzeyi/ParaAttention or https://github.com/ali-vilab/TeaCache.

It’s a shame they don’t compare against or mention them.

Interesting approach! Remind me the early insights that neurons in DNN that capture similar concepts.