I’d like to constrain the output of the LLM by accessing the probabilities for the next token, pick the next token that has the highest probability and also is valid in the type system, and use that. Originally OpenAI did give you the probabilities for the next token, but apparently that made it easy to steal the weights, so they turned that feature off.