Cute idea, but you're never gonna blow your token budget on output. Input tokens are the bottleneck, because the agent's ingesting swathes of skills, directory trees, code files, tool outputs, etc. The output is generally a few hundred lines of code and a bit of natural language explanation.

In single-turn use, yeah, but across dozens of turns there's probably value in optimizing the output.

Btw your point lands just as well without "Cute idea, but" https://odap.knrdd.com/patterns/condescending-reveal

I didn't mean it as condescending. I meant it literally is cute: A neat idea that is quite cool in its execution.

Pretty neat site you've got there. You should submit it to Show HN. I had fun clicking around - it's like TVTropes, except the examples make me angry, lol.

It would be pretty fun to train an LLM on this site and then have it flag my comments before I get downvoted, haha.

Thanks! I want to do something similar to your LLM suggestion, the endgame is tooling for forums and individuals to improve the quality of discourse. More broadly, I think LLMs and recent advancements now make it possible to assist with self improvement (e.g., see former startup Humu’s nudges but for everyone instead of just B2B)

Oh boy, every example reads like a HN comment!

You're practicing your own pattern ;)

Like your site and good luck with improving discourse on the Internet.

Good point and it's actually worse than that : the thinking tokens aren't affected by this at all (the model still reasons normally internally). Only the visible output that gets compressed into caveman... and maybe the model actually need more thinking tokens to figure out how to rephrase its answer into caveman style

Grug says you can tune how much each model thinks. Is not caveman but similar. also thinking is trained with RL so tends to be efficient, less fluffy. Also model (as seen locally) always drafts answer inside thinking then output repeats, change to caveman is not really extra effort.