Linear time attention doesn’t work, by principle. Dead end pursuit. Much great research on more efficient quadratic time inference

What about n log n?