Interesting article. I tend to use

- i = min(floor(f * 256), 255) (from float to uint8)

- f = i / 255 (from uint8 to float)

Basically a mix of the 2 approaches mentioned in the article.

For all integers between [0,255], if I do uint8 -> float -> uint8 conversion, I will get the same result.

--

edit: I wondered what's the maximum jitter amount that I can introduce to the float and get the same uint8 value. And also these 0->0.0 and 255->1.0 should map properly.

With my approach at the top, the jitter margin that I can introduce is 1/65280.

But with the article's approach

- i = floor(f * 255 + 0.5)

- f = i / 255

maximum jitter margin is 1/510 (which is better).

It's worth pointing out that the article explicitly calls out your first mixed technique:

> Finally, one should never mix the encode and decode steps of the two quantizers. That’s just broken code. It’s an easy mistake to make, though.

This is what I do for the former:

    floor( nextafter( 256, 255 ) * value )

Oh very nice idea to get rid of the min operator.