There are straightforward emulation settings in which a trained emulator can be more accurate than a single forward run, even when both training and "single forward run" use the same accuracy settings.

Suppose you emulate a forward model y = F(x), by choosing a design X = {x1, ..., xN}, and making a training set T = {(x1, y1), ..., (xN, yN)}.

With T, you train an emulator G. You want to know how good y0hat = G(x0) is compared to y0 = F(x).

If there is a stochastic element to the forward model F, there will be noise in all of the y's, including in the training set, but also including y0! (Hopefully your noise has expectation 0.)

(This would be the case for a forward model that uses any kind of Monte Carlo under the hood.)

In this case, because the trained G(x0) is averaging over (say) all the nearby x's, you can see variance reduction in y0hat compared to y0. This, for example, would apply in a very direct way to G's that are kernel methods.

I have observed this in real emulation problems. If you're pushing for high accuracy, it's not even rare to see.

More speculatively, one can imagine settings in which (deterministic) model error, when averaged out over nearby training samples in computing y0hat, can be smaller than the single-point model error affecting y0. (For example, there are some errors in a deterministic lookup table buried in the forward model, and averaging nearby runs of F causes the errors to decrease.)

I have seen this claim credibly made, but verifying it is hard -- the minute you find the model error that explains this[*], the model will be fixed and the problem will go away.

[*] E.g., with a plot of y0hat overlaid on y0, and the people who maintain the forward model say "do you have y0 and y0hat labeled correctly?"