LLMs have no concept of what makes the output "good". Or to put it another way, if the LLM generates an image with jumbled numbers it's because that was the most likely output, hence it was a "good" image according to its weights.
LLMs have no concept of what makes the output "good". Or to put it another way, if the LLM generates an image with jumbled numbers it's because that was the most likely output, hence it was a "good" image according to its weights.