Given how much you can use Codex on their $200 plan, I'm virtually certain that it's subsidized.

As to why, I think in part it is because people who are willing to pay that much per month are much more likely to be using it heavily on "serious" tasks, which is, of course, a goldmine for training data - even if you can't use the inputs directly for training, just looking at various real world issues and how agents handle them (or not) is valuable, especially when all the low-hanging fruit have already been picked.

I wouldn't even be surprised if the $20 users are actually subsidizing the $200 users.