Why is learning an appropriate metaphor for changing weights but not for context? There are certainly major differences in what they are good or bad at and especially how much data you can feed them this way effectively. They both have plenty of properties we wish the other had. But they are both ways to take an artifact that behaves as if it doesn't know something and produce an artifact that behaves as if it does.

I've learned how to solve a Rubik's cube before, and forgot almost immediately.

I'm not personally fond of metaphors to human intelligence now that we are getting a better understanding of the specific strengths and weaknesses these models have. But if we're gonna use metaphors I don't see how context isn't a type of learning.