Pretty sure the secret sauce is in the summarised thinking. Maybe better though process… But I have a feeling it’s server side tools and a scratch space to prepare the reply.

Sometimes the summarised thoughts include stuff that makes no sense unless it’s got a workspace on the server. Stuff like “I am now writing x to file y”.

Not championing it, but this is where something like OpenClaw comes into play, right? The harness around the model, the ability to call tools, etc.