> I find it interesting to consider that if you pick a value at random, it will usually fail! That is, most 64-bit integers cannot be written as the product of two 32-bit integers.
While I find the 17% number interesting to think about, "most" is far less interesting. Multiplication doesn't care about order so you're instantly cutting 2^64 possibilities down to about 2^63. That's a hair's breadth away from "most" already, and considering even a tiny amount of overlapping results gets you there.
What gets interesting is actually trying to quantify the overlapping results.
The word “most” is indeed not very surprising if taken to mean >50% (as you do), but the surprising fact (admittedly not relevant for 2^64 in the quoted sentence) is that “most” can be replaced with “almost all” in the technical sense meaning “tending to 1” i.e. “1 - o(1)”, i.e. “eventually greater than 1-epsilon for any epsilon”.
Here's the multiplication table up to 9 (so for n=10 in place of n=2^64), and it already contains only 37 distinct products among its 100 entries:
That is, in decimal, only 37% of (up to) two-digit numbers can be written as products of two one-digit numbers. This fraction, which drops to 28% at n=100, only drops to 17% at n=2^64 (per the article). So it decreases VERY slowly, and it's nontrivial that it actually goes to 0.It's an interesting fact, but it's weakened a lot by happening so slowly and not matching the everyday definition of "almost all".
And it's weakened even more by realizing that while you can get the raw fraction as low as you want, shrinking your list of products by n digits requires numbers with an exponential number of digits.
Yeah the number sounds a lot less impressive if you say that you only get 2^61.44 integers out of 2^64. In other words, a 4% entropy loss.
Information quantities are more meaningfully expressed in number of bits.
Exactly. Being precise about logarithmic vs linear utilizations is key here. I tried making a similar point about the inefficiency of IEEE-754 redundant NaN encodings here: https://arxiv.org/pdf/2508.05621
Having ~a quadrillion redundant bitstrings all mapping to NaN sounds pretty bad, but logarithmic/information utilization-wise, this is actually not too bad.
I've been looking at the 8087 NaN circuitry lately. Having 2^53 (or whatever) values for NaN was supposedly a feature: "the large number of NAN values that are available, provide the sophisticated programmer with a tool that can be applied to a variety of special situations." For example, the different NaN values could hold debugging information to track down errors.
See "8087 Numeric Data Processor" page S-74: https://ethw.org/w/images/2/2f/Intel_8086_family_users_numer...
You'll see some scripting languages (ab)use this. Where the native "number" type is a 64 bit float and only one NaN bit pattern is a real NaN. The others smuggle a pointer to an object in the lower bits. This way you don't spend any memory overhead indicating if a given variable contains a primitive or an object.
> For example, the different NaN values could hold debugging information to track down errors.
Unfortunately IEEE didn't bother specifying NaN propagation semantics so it ended up pretty useless.
All the primes above 2^32 are out, but that accounts for only two point something percent.
But also all of their multiples. I suspect that those account for the vast majority.
Each x is prime with probability 1/ln(x), each x has M/x multiples less than M, as a fraction of M that is just 1/x. Together that makes 1/(x ln(x)) with the indefinite integral ln(ln(x)). If we plug in 2^32 and 2^64 [1], we get ln(2). So about 69.3 % of all 64 bit integers should have a prime factor larger than 2^32 and therefore not be the product of two 32 bit integers. That leaves about 13 % unaccounted. Three prime factors all larger than 2^32/2, five prime factors all larger than 2^32/3, and so on cannot be packed into two 32 bit integers. Not sure to how much this will add up.
[1] The bounds are important because they guarantee that there is at most one prime factor from that range and this ensures that we are not double counting anything. If the upper bound was larger than the square of the lower bound, then we would have to worry about double counting numbers with more than one large prime factor.
Nice work. I guess "vast majority" is overstating the case, but majority anyway.
The simple explanation is just that lot of ways that numbers can multiply to produce the same product
Like an odd number x times an even number y, x* y produces the same product as x* 2 and y/2
Same for a multiple of 3 c and and a non multiple of 3 d, c * d = c/3 * d*3
This is just looking at it from a different perspective. Both, one 64 bit integer and the product of two 32 bit integers represent a number up to 2^64 with 64 bits. But while all 64 bit integers are unique, there are, as you say, several representation for some numbers as the product of two 32 bit integers and therefore it is impossible to represent all 64 bit integers. Commutativity alone costs you about 50 percent of all numbers in the range as x * y and y * x represent the same 64 bit number but with two different representation as a product of two 32 bit numbers, at least if x and y are different. But this tells you nothing about the numbers that you can not represent, only that they must exist. I was looking at it from this other perspective, which numbers are not representable as a product and why.
> Multiplication doesn't care about order so you're instantly cutting 2^64 possibilities down to about 2^63.
Not sure I understand.
Adding two 32 bit integers takes you to 33 bit integers. (1111 + 1111 = 11110).
Addition doesn't care about order, so you're instantly cutting 2^33 possibilities down to 2^32. Or so is your argument. But in reality you can reach nearly all of those 2^33 numbers.
Concatenating arbitrary 32 bit ints covers all possible 64 bit ints. So the space of all pairs of 32 bit ints is in bijection with 64 bit ints.
Commutativity introduces a relation on pairs of 32 bit ints (a,b) ~ (b,a), which accounts for one bit of information. Thus, at most 50% of 64bit ints show up as products of 32 bit ints.
Ah, fair enough, thanks everyone. So basically the argument is if that we have a deterministic function taking a pair (x_1, x_2) with x_i in X with |X| = M, then the function can produce at most M^2 outputs. And knowing that the function is symmetric cuts it down to M(M+1)/2. (Which is still far bigger than the 2M in my addition analogy.) Cheers.
Except the perfect squares don't reduce by half, so it's not quite 50% but it's very close.
Some of them, with notable exception of perfect squares of primes, can be expressed by products of different combinations of factors.
E.g., 6^2 = (223)3 = 2(233).
One of ~22 (ln(2^32)) perfect squares will be a square of perfect prime. Most won't.
Decoding this so it's readable: 6^2 = (2*2*3)*3 = 2*(2*3*3).
(You can escape * with \: \*)
Ha, fair!
The 2^64 in gps argument comes from the number of pairs of 32 bit numbers, not from the upper bound of multiplying two 32 bit numbers. So for the addition case the symmetry argument is still only good enough to get you down to about 2^63, which doesn't help you at all because you have much stronger information from the upper bound.
Addition in this case is cutting from 2^64 to 2^33-1.
The 2^64 number is the number of inputs. For an operation which is commutative, you expect the outputs to be 2^63+2^32 or smaller, since you’ve introduced symmetry.
Why does order matter?
Whether a 64-bit number can be written as the product of two 32-bit ones depends only on the prime factors of the 64-bit number - it's a property of the number itself, and apparently 17% of 64-bit numbers have this property.
The input space is 32 + 32 = 64 bits. The output space is 64 bits. So the best you can do is an 1-to-1 mapping.
However, since a * b = b * a, our input space has a lot of duplicate outputs. So from this alone you can conclude roughly half of the output space must be uncovered by any input pair, simply because there aren't enough input pairs.
OK - thanks. I must have misunderstood what the other poster was saying, since I thought they were objecting to the "most" characterization.
I wasn't saying it's wrong, I was saying that "most" is so easy to reach that it's a trivial and rather boring threshold.
Ok so I think I understand your insight: the number of 64 bit numbers you can get from multiplying two 32 bit numbers is the number of distinct results. I guess it follows that, of those 64 bit integers that can be written as the product of 2 32 bit ints, on average they can be factored into 32 bit ints 6 different ways. The only ones that could possibly be written as such a product exactly one way are the prefect squares.
A lot of the remaining is multiples of 4, which you can either get from having a 2 in both factors or a 4 in one (multiples of 9 are similar).
... or just considering the even numbers almost all of them are 2 x N where N>2^32 and that gets you to within a hair of "most" and if you add in the odd thirds for which the same is true you get a bound of 2/3 - epsilon.
It's a bit more subtle than that -- most n>2^32 are not prime in which case 2 x n has more factorizations you would have to check.
(Just by way of example, for n=2^33, 2n=2^34 but also =2^17*2^17)
> While I find the 17% number interesting to think about, "most" is far less interesting. Multiplication doesn't care about order so you're instantly cutting 2^64 possibilities down to about 2^63. That's a hair's breadth away from "most" already
It's much worse than that. It's difficult for a 64-bit product to have the high bit set if the multiplicands are both no larger than 32 bits.
I wouldn't say difficult. To set the high bit, the geometric mean of the two multiplicands has to be at least 3 billion. 32 bit numbers go up to 4.3 billion so that's quite common.
If I did the correct integral (the area inside the unit square above y=.5/x), then 15.34% of products should have the high bit set. That's much less than half but it's still happening constantly.