Hacker News

If you read the entire article, they do that at the end of the article.

Not quite. They build the entire table upfront, whether any individual entry is needed or not.

Making it an on-demand cache instead is a neat next step. Whether it helps or hurts depends on the actual input: If the input image uses every pixel value anyway, the additional overhead of checking whether the table entry is computed is just unnecessary extra with no value.

But if a typical image only uses a few pixel values, then the amortized cost of just calculating the few needed table entries may very well be significantly below the cost of the current algorithm.

If images are somewhere in between, or their characteristics not well known, then simply trying out both approaches with typical input data is a good approach!

Unless you’re perfectly happy with 0.2 seconds, for example because the runtime of some other parts take so long that dwarfs those 0.2s, then why bother.

egypturnash 4 days ago [ - ]

Even if you could get the "is this cell of the table calculated?" check down to one cycle (which you can't, 6502 branch instructions take 2-4 cycles, plus a few more operations to check the table calculation status), it's still gonna add one cycle to a hot loop that's done 43k times. Pre-calcing the entire table only burns 6k cycles, and can probably be done while displaying some transitional effect that requires very little CPU.

If we presume this scaling operation will be done more than one time (which it probably will be if it's getting this level of optimization) then it gets even worse.

anyfoo 4 days ago [ - ]

Good analysis. I guess I lost my cycle-counting foo, because on modern CPUs cycle-counting is a) infeasible on a superscalar, speculative, out-of-order CPU, and b) even if you did manage to do it, it hardly matters when a CPU cycle is less than a nanosecond, but any memory access that slips through the caches means many orders of magnitude higher latency.

Of course none of that applies here, but it colors the way you think about things…

egypturnash 3 days ago [ - ]

meanwhile the 6502 is all "cache? what's cache", every 6502 optimization is basically a pessimization for modern CPUs.

colinlm 3 days ago [ - ]

Every entry will be used, it's to scale X/Y coordinates :) And indeed, those 0.2s are dwarfed by the rest of the algorithm (10 seconds)

rbanffy 3 days ago [ - ]

Good point. I was thinking of pixel values.