Extremely interesting but I don't understand why it was written by an LLM. Either the frontier models are far better than I realized or else writing this document required a lot of manual work regardless at which point why not keep it in your own voice?

> The agent did not know that would also halve the LUT count. It found out by doing it and watching the synthesizer.

So I guess this is an example of an LLM anthropomorphizing and making wild conjectures about the internal workings of a different LLM.

Yeah I find this current LLM voice very tiring to read; I get enough of it day-to-day wrangling claude and others. I don’t think ‘writing’ this took very much work though, it was probably a “read the research logs, and write a blog post with charts showing our amazing results and hammering on the idea that verifiers matter” as a prompt. The rest you could go have a coffee for.

That said, the core idea of this — verification matters a lot — is well received, and in fact, this is totally awesome in terms of results. They mention at the end they’re not sure how much of this is microtuned against the benchmark, a sin that many CPU companies cheerfully commit and have committed over the last 40 years btw, so I’d be interested in a followup with more general benchmarking. Either way, amazing.

Yeah, you are totally right. Its a work in progress, and the post was written by an LLM - Im trying to improve on it (dash pun intended).

Regarding the benchmark overfitting, absolutely, it's pretty much overfitted. This CPU will only be as good as it benchmark. If I have the time I will try to get some applications and optimize for those.