Interesting approach. It doesn't even introduce an extra rounding error, because converting from 32-bit XYB to RGB should be similar to converting from 8-bit YUV to RGB.
However, when decoding an 8-bit-quality image as 10-bit or 12-bit, won't this strategy just fill the two least significant bits with noise?
Could be noise, but finding a smooth image that rounds to a good enough approximation of the original is quite useful. If you see a video player talk about debanding it is a exactly that.
I don't know if JPEG XL constrains solutions to be smooth.
I believe they constrain to piecewise smooth (i.e. don't smooth out edges but do smooth out nose)