Hacker News

Writing my own dithering algorithm in Racket

172 points by venusgirdle 3 months ago | 41 comments

The square artifacts in the dithered image are caused by the distribution not doing second passes over the pixels already with error distributed, this is a byproduct of the "custom" approach the OP uses, they've traded off (greater) individual colour error for general picture cohesion.

Me, I adjusted Atkinson a few years ago as I prefer the "blown out" effect: https://github.com/KodeMunkie/imagetozxspec/blob/master/src/...

A similar custom approach to prevent second pass diffusion is in the code too; it is slightly different implementation - processes the image in 8x8 pixel "attribute" blocks, where the error does not go out of these bounds. The same artifacts occur there too but are more distinct as a consequence. https://github.com/KodeMunkie/imagetozxspec/blob/3d41a99aa04...

Nb. 8x8 is not arbitrary, the ZX Spectrum computer this is used for only allowed 2 colours in every 8x8 block so this seeing the artifact on a real machine is less important as the whole image potentially had 8x8 artifacts anyway.

Aryezz 3 months ago [ - ]

Great read and nice drawings!

I made some impractical dithering algorithms a while ago, such as distributing the error to far away pixels or distributing more than 100% of the error: https://burkhardt.dev/2024/bad-dithering-algorithms/

Playing around with the distribution matrices and exploring the resulting patterns is great fun.

_ache_ 3 months ago [ - ]

Nice ! Thank you for the link! :)

virtualritz 3 months ago [ - ]

The dithered images have the wrong brightness mapping.

The reason is that the described approach will estimate the error correction term wrong as the input RGB value is non-linear sRGB.

The article doesn't mention anything about this so I assume the author is oblivious to what color spaces are and that an 8bit/channel RGB value will most likely not represent linear color.

This is not bashing the article; most people who start doing anything with color in CG w/o reading up on the resp. theory first get this wrong.

And coming up with your own dither is always cool.

See e.g. [1] for an in-depth explanation why the linearization stuff matters.

[1] http://www.thetenthplanet.de/archives/5367

venusgirdle 3 months ago [ - ]

Hi, OP here!

Thank you so much for pointing this out! Just read the post you linked and did some of my own research on the non-linearity of sRGB - really fascinating stuff :)

For now, I've acknowledged this limitation of my implementation so that any new readers are aware of it: https://amanvir.com/blog/writing-my-own-dithering-algorithm-...

But I'll definitely revisit the article to add proper linearization to my implementation when I have the time. Thanks again for mentioning this!

DistractionRect 3 months ago [ - ]

You're already 99% of the way there, you're just have the order of operations wrong.

What you're doing is sRGB -> linear perceived luminance space -> sRGB (greyscale, where R=G=B) -> dithering

When you should be applying dithering in the linear perceived luminance space, then covering the dithered image back into sRGB space.

virtualritz 3 months ago [ - ]

It's not about order of operations; there is simply no linearization of sRGB happening or mentioned in the article.

I.e. it's not ordering of operation but absence of an operation that is the issue.

> ... -> linear perceived luminance space -> ...

"Perceptual" luminance is not a concept used anywhere. Luminance is, by definition, linear [1].

When we talk about color, something can either be linear or perceptual. Not both. "Perceptual" refers to "how humans perceive something" and the human visual system is non-linear.

Relative luminance [2] (what people refer to as grayscale) and what we're dealing with in the post, is still linear. So the order of operations is:

sRGB non-linear -> sRGB linear -> grayscale -> dither

After the dithering no conversion back to sRGB is necessary for the case at hand because all pixels are black or white and the sRGB transfer function for the inputs 0 and 1 has the outputs 0 and 1.

See also [3]. You only would re-apply a sRGB linear -> sRGB non-linear transform if you didn't dither to black and white after the grayscale conversion. I.e. in the case at hand: if you dithered the 8bit grayscale to n bit grayscale with 1 < n < 8.

[1] https://en.wikipedia.org/wiki/Luminance

[2] https://en.wikipedia.org/wiki/Relative_luminance

[3] https://en.wikipedia.org/wiki/Grayscale#Colorimetric_(percep...

DistractionRect 3 months ago [ - ]

You're absolutely right. I skimmed and misread.

danielvf 3 months ago [ - ]

To some extent I think this comes down to a preference, like many things in dithering. If the black and white results look good, that may be the right answer!

I've played with dithering tools that provide both options, and I prefer the output of the simple version..

venusgirdle 3 months ago [ - ]

Haha, yeah I was kind of thinking that as well! Like with different error-diffusion patterns, one method may be more visually appealing than the other.

Although, with either approach, I definitely feel that the fact that sRGB is non-linear should be acknowledged, and that’s something I was completely unaware of. So, I’m happy I learned something new today :)

Lerc 3 months ago [ - ]

Are dithering patterns proportional in perceived brightness to a uniform grey for any percentage of set pixels?

I can see them not being linearly proportional to a smooth perceptual grey gradient as the ratio of black to white changes, but I suspect it might change also with the clustering of light and dark at the same ratio.

neilv 3 months ago [ - ]

(Kudos on doing this in Racket. Besides being a great language to learn and use, using Racket (or another Scheme, or other less-popular language) is a sign that the work comes from genuine interest, not (potentially) just pursuit of keyword employability.)

Side note on Lisp formatting: The author is doing a mix of idiomatic cuddling of parenthesis, but also some more curly-brace-like formatting, and then a cuddling of a trailing small term such that it doesn't line up vertically (like people sometimes do in other languages, like, e.g., a numeric constant after a multi-line closure argument in a timer or event handler registration).

One thing some Lisp people like about the syntax is that parts of complex expression syntax can line up vertically, to expose the structure.

For example, here, you can clearly see that the `min` is between 255 and this big other expression:

    (define luminance
      (min (exact-round (+ (* 0.2126 (bytes-ref pixels-vec (+ pixel-pos 1)))   ; red
                           (* 0.7152 (bytes-ref pixels-vec (+ pixel-pos 2)))   ; green
                           (* 0.0722 (bytes-ref pixels-vec (+ pixel-pos 3))))) ; blue
           255))

Or, if you're running out of horizontal space, you might do this:

    (define luminance
      (min (exact-round
            (+ (* 0.2126 (bytes-ref pixels-vec (+ pixel-pos 1)))   ; red
               (* 0.7152 (bytes-ref pixels-vec (+ pixel-pos 2)))   ; green
               (* 0.0722 (bytes-ref pixels-vec (+ pixel-pos 3))))) ; blue
           255)))

Or you might decide those comments should be language, and do this:

    (define luminance
      (let ((red   (bytes-ref pixels-vec (+ pixel-pos 1)))
            (green (bytes-ref pixels-vec (+ pixel-pos 2)))
            (blue  (bytes-ref pixels-vec (+ pixel-pos 3))))
        (min (exact-round (+ (* red   0.2126)
                             (* green 0.7152)
                             (* blue  0.0722)))
             255)))

One of my teachers would still call those constants "magic numbers", even when their purpose is obvious in this very restricted context, and insist that you bind them to names in the language. Left as an exercise to the reader.

qwertox 3 months ago [ - ]

Somewhat related and worth watching:

Surface-stable fractal dithering explained

https://youtu.be/HPqGaIMVuLs

There's a follow-up video to that one.

jszymborski 3 months ago [ - ]

That was mind-blowing

kleiba 3 months ago [ - ]

For anyone who hasn't seen it yet, here's Lukas Pope's forum post on finding a dithering approach that works well with animations:

https://forums.tigsource.com/index.php?topic=40832.msg136374...

itsgabriel 3 months ago [ - ]

You may also enjoy Surface Stable Dithering: https://youtube.com/watch?v=HPqGaIMVuLs

dyoo1979 3 months ago [ - ]

This reminds me a bit of the octree quantization implementation I hacked up to improve speed of generating Racket's animated gifs.

* https://github.com/racket/racket/commit/6b2e5f4014ed95c9b883...

* https://github.com/racket/racket/commit/f2a1773422feaa4ec112...

_ache_ 3 months ago [ - ]

I did the same like 2 weeks ago. In Rust. ^^

I'm still trying to improve it a little. https://git.ache.one/dither/tree/?h=%f0%9f%aa%b5

I didn't published it because it's hard to actually put dithered images on the web, you can't resize a dithered image. So on the web, you have to dither it on the fly. It's why, in the article, there is some artifacts in the images. I still need to learn about dithering.

Reference: https://sheep.horse/2022/12/pixel_accurate_atkinson_ditherin...

Cool links about dithering: - https://beyondloom.com/blog/dither.html - https://blog.maximeheckel.com/posts/the-art-of-dithering-and...

01HNNWZ0MV43FF 3 months ago [ - ]

Why can't you resize it? Because of the filtering? You can turn that off in css, right?

AndrewStephens 3 months ago [ - ]

I am the author of the sheep.horse link above, although here[0] is an updated link.

Even with filtering turned off you get slightly incorrect results, especially if you are resizing down where aliasing might completely ruin your image. Harsh black-and-white dithering is very susceptible to scaling artifacts.

If you want pixel perfect dithering for the screen you are viewing the page on, you need to do it client side. Whether or not this is worth the bother is up to you.

[0] https://sheep.horse/2023/1/improved_web_component_for_pixel-...

pixelpoet 3 months ago [ - ]

Note that this isn't a problem for blue noise based dithering; nevertheless, it's better if dithering is the last operation, and the result displayed 1:1 with pixel output.

crazygringo 3 months ago [ - ]

> Atkinson dithering is great, but what's awesome about dithering algorithms is that there's no definitive "best" algorithm!

I've always wondered about this. Sure, if you're changing the contrast then that's a subjective change.

But it's easy to write a metric to confirm the degree to which brightness and contrast are maintained correctly.

And then, is it really impossible to develop an objective metric for the level of visible detail that is maintained? Is that really psychovisual and therefore subjective? Is there really nothing we can use from information theory to calculate the level of detail that emerges out of the noise? Or something based on maximum likelihood estimation?

I'm not saying it has to be fast, or that we can prove a particular dithering algorithm is theoretically perfect. But I'm surprised we don't have an objective, quantitative measure to prove that one algorithm preserves more detail than another.

zamadatix 3 months ago [ - ]

I think the problem is less with the possibility of developing something to maximize a metric (though that could be hard depending how you define the metric) and more with no single metric meeting all use cases so you're not going to end up with a definitive answer anyways. Some images may be better suited for an algorithm with the metric of preserving the most literal detail. Others for preserving the most psychovisual detail. Others for something which optimize visibility even if it's not as true to the source. No one metric will be definitively the best thing to measure against for every image and use case fed to it.

You find the same in image resizing. No one algorithm can be the definitive best for e.g. pixel art and movie upscaling. At the same time nobody can agree what the best average metric of all of that could be. Of course if you define a non-universally important metric as the only thing which matters you can end up with certain solutions like sinc being mathematically optimal.

It does lead to the question though: are there well defined objective metrics of dithering quality for which we don't have a mathematically optimal answer?

crazygringo 3 months ago [ - ]

> more with no single metric meeting all use cases

Is it, though?

Dithering to black-and-white is pretty simple. If the only thing you want to do is maximize detail while preserving accurate brightness, I don't really see a lot of leeway there.

Now sure, you can choose to artistically adjust some tradeoff of less detail for... something? But it feels like there ought to at least be an objectively correct starting point for a metric, no? I'm curious if that really doesn't exist.

zamadatix 3 months ago [ - ]

You'd have tk be more specific in what you mean by detail. E.g. if I had an image which in normal sRGB color space would be encoded as a detailed silhouette of #FEFEFE on a background of #FFFFFF then one could argue the most detailed is to do a hard threshold between the two shades to preserve the shape information. Another person may argue it's not detailed if it throws away 100% of the original brightness information so it should eschew the spatial information to let the user know the original image was actually near macimally bright. Another person may argue it should be something between as that's most visually pleasing to them.

Which of these is "more detailed" and how does that hold for every possible source image and its display intent? I think what you'll find is you can define things like "most contrast" or "most spatial detail" or "most accurate brightness" but generic things like "most most detail while preserving brightness" will come out as a judgement call rather than a mathematical criteria everyone will agree is ideal all the time. That doesn't mean such a metric (if well defined) can't be useful, must don't expect it to always be ideal.

hatthew 3 months ago [ - ]

Let's say the original image is solid light gray, maybe 99% white. The palette you have for dithering is only pure white and pure black. What's the "best" possible output? Would it be pure white, or would it be 1 black pixel in every 10x10 square? Pure white is more accurate to the "shape" or "details" of the original, but sparse black pixels make a more accurate "color". Whatever your answer is, would it stay the same for 50% white? What about 99.99% black? Where do you draw the line?

Lets say that it's ~94% white. I think it's reasonable to have 1 black pixel in every 4x4 square on average -- that doesn't feel too sparse to me. But if it's literally just black pixel spaced on an even grid, that would look like a pattern to most people, which would still give the impression of a detail that isn't there. So how do you space them out? More even spacing gives the impression of a pattern that isn't there, but more random spacing give the impression of "clustering" and thus a texture that isn't there. There's literally no solution other than subjectively choosing a tradeoff between patterns and clumps.

crazygringo 3 months ago [ - ]

If you're preserving accurate brightness, then yes obviously 99% white needs 1 black pixel out of 100 on average. Accounting for gamma. There's no line to draw. That's already part of the definition. (You can increase contrast for artistic effect, but that's a different conversation.)

And it doesn't matter if you have patterns or not. What matters is that when you look at different dithering algorithms, it's incredibly clear that some (like truly random noise) make detail very difficult to see, while others (like error diffusion) make it much easier to see.

Just look at: https://en.wikipedia.org/wiki/Dither#Algorithms and observe the difference between "Random", "Ordered (void-and-cluster)", and "Floyd-Steinberg". The level of resolvable detail is obviously increasing. It's not subjective, it's literally the level of signal vs noise. How do you quantify that as a metric?

E.g. one way would be to calculate the standard deviation of the brightness in the dithered image from the original in every 4x4 set of pixels, to minimize their sum, or the sum of their squares, or something. But 4x4 is totally arbitrary, so I'm looking for something more elegant and generalizable. The point is, it shouldn't be dependent on human perception. Detail is detail. Signal is signal. So how do you prove which algorithm preserves the most detail, or prove that an algorithm preserves maximum possible detail?

hatthew 3 months ago [ - ]

Let's say you have a photo of a starry night sky, and a photo of a slightly brighter sky with no visible stars. If you do "fully accurate on average" dithering, the dithered output would be identical. But in that context, the difference between "sky with dots" and "sky without dots" is more important than the difference between "dark sky" and "very slightly less dark sky". In that context, I would say a dithering algorithm that discards the very slight error in shade in favor of better accuracy in texture is objectively better.

On that wikipedia page, compare Floyd–Steinberg vs Gradient-based. In my opinion, gradient-based better preserves detail in high-contrast areas (e.g. the eyelid), whereas FS better preserves detail in low-contrast areas (e.g. the jawline between the neck and the cheek).

crazygringo 3 months ago [ - ]

You're talking about artistic tradeoffs. That's fine.

I'm asking, how do you quantitatively measure in the first place so you can even define the tradeoffs quantitatively?

You say how in your opinion, different algorithms preserve detail better in different areas. My question is, how do we define that numerically so it's not a matter of opinion? If it depends on contrast levels, you can then test with images of different contrast levels.

It doesn't seem unreasonable that we should be able to define metrics for these things.

tonyarkles 3 months ago [ - ]

> And then, is it really impossible to develop an objective metric for the level of visible detail that is maintained? Is that really psychovisual and therefore subjective? Is there really nothing we can use from information theory to calculate the level of detail that emerges out of the noise? Or something based on maximum likelihood estimation?

I think you'll find something similar with audio, image, and video compression algorithms in general. The majority of these are explicitly designed to spend more bits on the parts that humans care about and fewer bits on the parts we won't notice. E.g. MP3 uses perceptual coding/masking: when there's loud sound at frequency f0 and a quieter sound at nearby f1=f0+(small delta f) most people won't hear the sound at f1 so the codec just throws it away. You'd be able to see the difference on a spectrogram but you wouldn't be able to hear it.

shawn_w 3 months ago [ - ]

One style suggestion: nested `for` expressions can be combined into a single `for*`, helping reduce indentation depth:

    (for* ([i height]
           [j width])
      ...)

turnsout 3 months ago [ - ]

This is awesome! But be careful, if you dig much further you're going to get into blue noise, which is a very deep rabbit hole.

pixelpoet 3 months ago [ - ]

The thresholding should be done in linear space I think, not directly on the sRGB encoded values.

Also I think the final result has some pretty distracting structured artifacts compared to e.g. blue noise dithering.

jansan 3 months ago [ - ]

I think implementing a dithering algorithm is one of the most satisfying projects, because it is fun, small(ish) and you know when you are done.

Of course, unless you are trying to implement something completely insane like Surface-Stable Fractal Dithering https://www.youtube.com/watch?v=HPqGaIMVuLs

lampiaio 3 months ago [ - ]

Wouldn't it make more sense to display the samples at 100% in the article? Had to open the images in a new tab to fully appreciate the dithering.

sho_hn 3 months ago [ - ]

Thanks for pointing this out. The embedded scaled version (which isn't clickable) had artifacts for me, and wasn't very nice.

v9v 3 months ago [ - ]

I love the small image previews to the left of the lines of code loading and saving images. Which editor is this?

venusgirdle 3 months ago [ - ]

I love them too :)

Visual Studio Code with this extension: https://marketplace.visualstudio.com/items/?itemName=kisstko...

ralphc 3 months ago [ - ]

I'll start off admitting to not digging into the article yet, but is this something that can be broken up and done in parallel? I'm an Elixir fanboy so I try to parallelize anything I can, either to speed it up or because I'm an Elixir fanboy.

archerx 3 months ago [ - ]

Most image processing can be multithreaded, you just have to chunk the image and send each chunk to a thread. You might have to pad the chunk based on the processing you’re doing i.e convolution which needs the neighboring pixels per pixel to work.

3 months ago [ - ]

[deleted]