They probably just look at the results of the generation.
I mean would I like a in-depth tour of this? Yes.
But it's a marketing blog article, what do you expect?
They probably just look at the results of the generation.
I mean would I like a in-depth tour of this? Yes.
But it's a marketing blog article, what do you expect?
> just look at the results of the generation
And? The entire hallucination problem with text generators is "plausible sounding yet incorrect", so how does a human eyeballing it help at all?
I think because here there's no single correct answer that the model is allowed to be fuzzier. You still mix in real training data and maybe more physics based simulation of course but it does seem acceptable that you synthesize extremely tail evaluations since there isn't really a "better" way by definition and you can evaluate the end driving behavior after training.
You can also probably still use it for some kinds of evaluation as well since you can detect if two point clouds intersect presumably.
In much a similar way that LLMs are not perfect at translation but are widely used anyway for NMT.