Who knows? Perhaps attention really is all you need. Maybe our context window is really large. Or our compression is really effective. Perhaps adding external factors might be able to indirectly teach the models to act more in line with social expectations such as being embarrassed to repeat the same mistake, unlocking the final piece of the puzzle. We are still stumbling in the dark for answers.