Hacker News

That's a fair point. Normally if you injected the "dog" token, that would cause a set of values to be populated into the kv cache, and those would later be picked up by the attention layers. The question is what's fundamentally different if you inject something into the activations instead?

I guess to some extent, the model is designed to take input as tokens, so there are built-in pathways (from the training data) for interrogating that and creating output based on that, while there's no trained-in mechanism for converting activation changes to output reflecting those activation changes. But that's not a very satisfying answer.