It seems like this could be solved by partial structured output, where the structure of the JSON itself is constrained, but the values of the JSON entries is not (so even if "quantity" here is set to int, the model can output "52.2"). Of course, we would need additional parsing, but I think it's a fair compromise.

And about structured outputs messing with chain-of-thought... Is CoT really used with normal models nowadays? I think that if you need CoT you might as well use a reasoning model, and that solves the problem.