Somewhat relevant is a blog-post that likens attention to kernel smoothing: https://bactra.org/notebooks/nn-attention-and-transformers.h... (as discussed before in https://news.ycombinator.com/item?id=38756888)
Somewhat relevant is a blog-post that likens attention to kernel smoothing: https://bactra.org/notebooks/nn-attention-and-transformers.h... (as discussed before in https://news.ycombinator.com/item?id=38756888)