You can't just take an arbitrary neural network architecture, and make it do anything by giving it an appropriate loss function, and in particular you can't take a simple feed forward model like a Transformer and train it to be something other than a feed forward model... If the model architecture doesn't have feedback paths (looping) or memory that persists from one input to the next, then no reward function is going to make it magically sprout those architectural modifications!

Today's Transformer-based LLMs are just what the name says - (Large) Language Models - fancy auto-complete engines. They are not a full blown cognitive architecture.

I think many people do have a good idea how to build cognitive architectures, and what the missing parts are that are needed for AGI, and some people are working on that, but for now all the money and news cycles are going into LLMs. As Chollet says, they have sucked all the oxygen out of the room.