> more with no single metric meeting all use cases
Is it, though?
Dithering to black-and-white is pretty simple. If the only thing you want to do is maximize detail while preserving accurate brightness, I don't really see a lot of leeway there.
Now sure, you can choose to artistically adjust some tradeoff of less detail for... something? But it feels like there ought to at least be an objectively correct starting point for a metric, no? I'm curious if that really doesn't exist.
You'd have tk be more specific in what you mean by detail. E.g. if I had an image which in normal sRGB color space would be encoded as a detailed silhouette of #FEFEFE on a background of #FFFFFF then one could argue the most detailed is to do a hard threshold between the two shades to preserve the shape information. Another person may argue it's not detailed if it throws away 100% of the original brightness information so it should eschew the spatial information to let the user know the original image was actually near macimally bright. Another person may argue it should be something between as that's most visually pleasing to them.
Which of these is "more detailed" and how does that hold for every possible source image and its display intent? I think what you'll find is you can define things like "most contrast" or "most spatial detail" or "most accurate brightness" but generic things like "most most detail while preserving brightness" will come out as a judgement call rather than a mathematical criteria everyone will agree is ideal all the time. That doesn't mean such a metric (if well defined) can't be useful, must don't expect it to always be ideal.
Let's say the original image is solid light gray, maybe 99% white. The palette you have for dithering is only pure white and pure black. What's the "best" possible output? Would it be pure white, or would it be 1 black pixel in every 10x10 square? Pure white is more accurate to the "shape" or "details" of the original, but sparse black pixels make a more accurate "color". Whatever your answer is, would it stay the same for 50% white? What about 99.99% black? Where do you draw the line?
Lets say that it's ~94% white. I think it's reasonable to have 1 black pixel in every 4x4 square on average -- that doesn't feel too sparse to me. But if it's literally just black pixel spaced on an even grid, that would look like a pattern to most people, which would still give the impression of a detail that isn't there. So how do you space them out? More even spacing gives the impression of a pattern that isn't there, but more random spacing give the impression of "clustering" and thus a texture that isn't there. There's literally no solution other than subjectively choosing a tradeoff between patterns and clumps.
If you're preserving accurate brightness, then yes obviously 99% white needs 1 black pixel out of 100 on average. Accounting for gamma. There's no line to draw. That's already part of the definition. (You can increase contrast for artistic effect, but that's a different conversation.)
And it doesn't matter if you have patterns or not. What matters is that when you look at different dithering algorithms, it's incredibly clear that some (like truly random noise) make detail very difficult to see, while others (like error diffusion) make it much easier to see.
Just look at: https://en.wikipedia.org/wiki/Dither#Algorithms and observe the difference between "Random", "Ordered (void-and-cluster)", and "Floyd-Steinberg". The level of resolvable detail is obviously increasing. It's not subjective, it's literally the level of signal vs noise. How do you quantify that as a metric?
E.g. one way would be to calculate the standard deviation of the brightness in the dithered image from the original in every 4x4 set of pixels, to minimize their sum, or the sum of their squares, or something. But 4x4 is totally arbitrary, so I'm looking for something more elegant and generalizable. The point is, it shouldn't be dependent on human perception. Detail is detail. Signal is signal. So how do you prove which algorithm preserves the most detail, or prove that an algorithm preserves maximum possible detail?
Let's say you have a photo of a starry night sky, and a photo of a slightly brighter sky with no visible stars. If you do "fully accurate on average" dithering, the dithered output would be identical. But in that context, the difference between "sky with dots" and "sky without dots" is more important than the difference between "dark sky" and "very slightly less dark sky". In that context, I would say a dithering algorithm that discards the very slight error in shade in favor of better accuracy in texture is objectively better.
On that wikipedia page, compare Floyd–Steinberg vs Gradient-based. In my opinion, gradient-based better preserves detail in high-contrast areas (e.g. the eyelid), whereas FS better preserves detail in low-contrast areas (e.g. the jawline between the neck and the cheek).
You're talking about artistic tradeoffs. That's fine.
I'm asking, how do you quantitatively measure in the first place so you can even define the tradeoffs quantitatively?
You say how in your opinion, different algorithms preserve detail better in different areas. My question is, how do we define that numerically so it's not a matter of opinion? If it depends on contrast levels, you can then test with images of different contrast levels.
It doesn't seem unreasonable that we should be able to define metrics for these things.