I absolutely love Kimi's personality - some of the things it says are so out there! And it's been great for very focused, iterative work.

Its weakness is that it seems to yak on-and-on when it needs to plan out something big or read through and make sense of how to use a niche piece of a complex library. To the point where it can fill up its 256k window - and rack up a build. (No cache.) I have had better experience with GLM 5.1 in those cases.

Anyone out there relate?

Absolutely. I use caveman to help with that: https://github.com/JuliusBrussee/caveman

You can just add "be brief" to the prompt to replace the entire plugin. Same results.

https://www.maxtaylor.me/articles/i-benchmarked-caveman-agai...

Not a bad idea - however

> Caveman only affects output tokens — thinking/reasoning tokens are untouched.

The problem is the thinking. But could help to tune my system prompt for Kimi.