Even if you specify performance ranges for every individual operation, you can’t specify all possible interactions between operations.

If you don’t care about the code you’re not checking in the code, and every time you regenerate the code you’re going to get radically different system performance.

Say you have 2 operations that access some data and you specify that each can’t take more than 1ms. Independently they work fine, but when a user runs B then A immediately, there’s some cache thrashing that happens that causes them to both time out. But this only happens in some builds because sometimes your LLM uses a different algorithm.

This kind of thing can happen with normal human software development of course, but constantly shifting implementations that “no one cares about” are going to make stuff like this happen much more often.

There’s already plenty of non determinism and chaos in software, adding an extra layer of it is going to be a nightmare.

The same thing is true for every single implementation detail that isn’t in the spec. In a complex system even implementation details you don’t think you care about become important when they are constantly shifting.