Hacker News

I wonder if this can be categorized as galactic algorithm. I can't imagine systems where bulk of processing goes into integer to decimal string conversion but maybe there are such.

https://en.wikipedia.org/wiki/Galactic_algorithm

oersted 2 days ago [ - ]

My understanding of a Galactic Algorithm is that it has better performance scaling based on input size/complexity, but its overhead is such that it will not actually be faster unless you use it for impracticality large inputs.

I don’t think it has much to do with the case of an algorithm that offers a faster solution to a problem that is rarely a bottleneck (not sure if that’s true in this case anyway).

Tuna-Fish a day ago [ - ]

It takes a substantial amount of time when emitting lots of numbers in JSON, happens very commonly.

And this algorithm has low constant costs, and does not take dramatically more icache than the simple versions. There is no reason not to use this if your compile target can handle avx-512.

breppp a day ago [ - ]

isn't avx-512 one of the more poorly supported set?

Tuna-Fish a day ago [ - ]

It's on every amd cpu from zen4 onwards, every remotely recent Intel server cpu, and now again on intel starting with nova lake this year.

In the future, it will be everywhere.

breppp 4 hours ago [ - ]

I think the issue is that Intel skipped avx-512 for a few generations, meaning that you can't rely on it being available on your compile targets

jmalicki a day ago [ - ]

I don't know about this specifically, but I've seen a lot of big data jobs where 99% of the CPU was spent in JSON ser/deser. This might be a reasonable chunk of it.

aardvark179 a day ago [ - ]

JSON ser deser is usually dominated by floats rather than ints, and they are more expensive to handle.

xlii a day ago [ - ]

Though when we talk about JSON (5 bytes - 40 bits) the processing throughput is 20Gbps (this algorithm) vs 5Gbps (previous implementation, 4x slower).

I doubt CPU cycles are problem in that case.

jmalicki a day ago [ - ]

I agree - I was more talking about naive JSON ser/deser that Lemire was not involved in.

Aardwolf a day ago [ - ]

> An example of a galactic algorithm is the fastest known way to multiply two numbers,[4] which is based on a 1729-dimensional Fourier transform.[5]

That number looked familiar, and yep it's the taxicab number. Coincidence? Neither of the two references seems to mention it

scapp a day ago [ - ]

It's somewhat a coincidence. In the paper, the dimension has to be larger than this constant K which is shown to be no larger than 1728. This is a fairly crude estimate that comes from multiplying a bunch of small numbers together, so the fact that it ends up being 12³ is not unimaginable. Then taking the dimension to be one larger than this estimate on K, we get 1729 = 12³ + 1³ (= 9³ + 10³).

However, 1728 isn't the minimum possible value for K. With more precise estimates, sketched in the paper, we can bring it down to 8 + ε. So assuming that this finer estimate works, using a 9 dimensional Fourier transform would also make the algorithm be O(n log(n)).

jcelerier a day ago [ - ]

export to large files that represent numbers in textual format for instance ? this can be the difference between "waiting 10 seconds when hitting ctrl-s" and "the software saving automatically on each change because it's unnoticeable"

adrian_b a day ago [ - ]

I always use binary interchange formats between programs so I am not familiar with the overhead caused by format conversions. Even when displaying numbers for reading them, in the case of floating-point numbers that are displayed in the "scientific" format, i.e. with exponents, I prefer to have only the exponent as a decimal number, but the significand as a hexadecimal number. So I do not need fast algorithms for number conversions.

Nonetheless, there are plenty of people who advocate the use of JSON, XML and similar formats, in which case I assume that number conversions can take a non-negligible time, which might be decreased by such fast algorithms.

superjan a day ago [ - ]

You know, if can change code without overhead to ends of the pipeline, using the language & library of my choice, I’d do this too. For many of us this isn’t always the case.

superjan a day ago [ - ]

It’s faster for 3 digits and more. 3 digits is not galactic scale. Otoh, if over half of your numbers are single digits, it will lose to other implementations. I think that is more often the case that we’d like it to be.

po1nt a day ago [ - ]

We used it for payment processing. We got huge CSVs from various APIs and used string decimals for computing to avoid overflows/underflows and rounding errors.

Galanwe a day ago [ - ]

The reverse, string to integer, has huge applications in quant finance.