It’s a simple compiler optimization over bayesian statistics. It’s masters-level stuff at best, given that I’m on it instead of some expert. The codebase is mixed python and rust, neither of which are uncommon.

The issues I ran into are primarily “tail-chasing” ones - it gets into some attractor that doesn’t suit the test case and fails to find its way out. I re-benchmark every few months, but so far none of the frontier models have been able to make changes that have solved the issue without bloating the codebase and failing the perf tests.

It’s fine for some boilerplate dedup or spinning up some web api or whatever, but it’s still not suitable for serious work.

Would you expect a junior engineer to perform better than this?