has anyone come across an r2d3-style explainer for something as high-dimensional as a Transformer's attention mechanism?