We have hit a plateau for many months for the performance of LLMs. Anthropic recently released 4.5, and while it improved on some contexts, it failed to make a commit message for me a few times on a workflow I had. 3.5 to 4 had close to zero failure rates on this workflow and it was surprising to see 4.5 fail. It seems that gains in certain benchmarks will affect quality elsewhere.

LLMs are very useful, I can’t see myself walking back to the old way of doing things. But the amounts invested expect major breakthrough that we are not anywhere near. It’s a gamble and that’s what innovation is; but you gamble on a small portion of your wealth. Not your house and certainly you do not gamble a huge country like the US on a single thing.