Your post made me laugh because I experienced the same as you but the other way around. I switched from Claude to a multi model harness a couple of days ago and the first model I tried was GLM5.2.
I gave it some simple code porting exercises and watched dumbfounded at the reasoning, which was more like the ravings of a lunatic - but lo and behold, after much confusion and a dizzying number of eureka moments the task was completed very successfully.
I tried Kimi on a similar task, much faster, a little more reassuring somehow in its ramblings, also surprisingly good results.
To be clear, I’m not surprised the results were good because they’re not GPT or Claude, but because the line of reasoning was so bonkers. Coming from Claude, I was just not used to seeing this, but I’ll bet it’s just as nuts with the frontier models and we’re just not allowed to see it (I’m about to read the links you shared).
Agree wholeheartedly that transparency is of grave importance.
> Coming from Claude, I was just not used to seeing this
Claude doesn't show its internal Cot?
If you look at the "thinking" traces as ways of expressions of uncertainty rather than literal thinking they make more sense.
Consider debugging - you start off in one place, think you have worked out what is happening, and then there is a "oh but what about xxx" thing that happens and you explore another branch. Then you "have it for sure" until you find another edge case.
The LLM is doing something analogous. It's writing circuits to try to emulate your program. Each time it gets one that seems right it is very sure that circuit is correct, but then it finds another thing.
At any point you can stop and go "write code now" and it will, and the code will seems fine provided it hasn't hit one of these edge cases.
Turning up thinking time is literally forcing more exploration.
The words that come out are amusingly dramatic, but... TBH when I debug I often are like "WTF" and throwing my hands up in the air at some gotcha I didn't expect.
Yeah isn't that thinking weird?
Now I see the issue clearly! But wait... now I have the full picture! But wait... Found it!
I gave up a few times because of it at first until I realized I just had to let GLM get on with it and what came out was great!
But once it was outright endearing- challenging bug, it said: I have been very thorough. Then it escalated where to look and aced it. Built in confucian values
I'm like 90% sure the harnesses inject those tokens into the ai to make them check their work. Things like "but wait" and "but what if..." etc. Like literally inserting them artificially and then say "carry on from here" to the ai so it's working as though it itself output those words so the ai has an opportunity to turn around if its making a mistake. repeat a bunch of times and we get something useful.
I started noticing those in gh copilot right around when they turned off thinking traces end of last year
If there’s one thing I’ve learned these past couple of days, it’s to resist the temptation to jab the escape button and start waving my arms! I wonder how much of this cyclical self doubt / self congratulating I go through in my own thoughts without even realising it. If you could verbalise or articulate all the half thoughts, snatches of ideas, feelings and ruminations the human mind goes through on some tasks it might be even more bizarre (or could just be me)