Linear time attention doesn’t work, by principle. Dead end pursuit. Much great research on more efficient quadratic time inference
Linear time attention doesn’t work, by principle. Dead end pursuit. Much great research on more efficient quadratic time inference
What about n log n?