So the model generates code, and let's say it is wrongly typed, we then take the rightly typed version and use cross entropy between them? Is that right? That just sounds like the typical training, unless you can somehow take arbitrary code that the model generated and automatically find the rightly typed version, so you won't need a dataset for it
Rather than letting the model generate arbitrary code and type-checking it afterward, the author wants to pre-restrict the output with templates that are well-typed by construction and only let the model make choices between valid alternatives in that restricted output space.