> The implications of these geometric properties are staggering. Let's consider a simple way to estimate how many quasi-orthogonal vectors can fit in a k-dimensional space. If we define F as the degrees of freedom from orthogonality (90° - desired angle), we can approximate the number of vectors as [...]

If you're just looking at minimum angles between vectors, you're doing spherical codes. So this article is an analysis of spherical codes… that doesn't reference any work on spherical codes… seems to be written in large part by a language model… and has a bunch of basic inconsistencies that make me doubt its conclusions. For example: in the graph showing the values of C for different values of K and N, is the x axis K or N? The caption says the x axis is N, the number of vectors, but later they say the value C = 0.2 was found for "very large spaces," and in the graph we only get C = 0.2 when N = 30,000 and K = 2---that is, 30,000 vectors in two dimensions! On the other hand, if the x axis is K, then this article is extrapolating a measurement done for 2 vectors in 30,000 dimensions to the case of 10^200 vectors in 12,888 dimensions, which obviously is absurd.

I want to stay positive and friendly about people's work, but the amount of LLM-driven stuff on HN is getting really overwhelming.

Spherical codes are kinda of obscure: I haven't heard of them before, and Wikipedia seems to have barely heard of them. And most of the Google results seem to be about playing golf in small dimensions (ie, how many can we optimally pack in n<32 dimensions?).

People do indeed rediscover previously existing math, especially when the old content is hidden under non-obvious jargon.

The problem with saying something is LLM generated is it cannot be proven and is a less-helpful way of saying it has errors.

Pointing out the errors is a more helpful way if stating problems with the article, which you have also done.

In that particular picture, you're probably correct to interpret it as C vs N as stated.

> The problem with saying something is LLM generated is it cannot be proven and is a less-helpful way of saying it has errors.

It's a very helpful way of saying it shouldn't be bothered to be read. After all, if they couldn't be bothered to write it, I can't be bothered to read it.

You have no idea how much personal work went behind it. You just suspect it was worded with a LLM.

I have been using embeddings for almost a decade and am well versed with their intricacies. I think this article has merit. The direction of the investigation and the conclusion are interesting, good to have people thinking about how many distinct concepts can be packed in our usual embedding dimension. Wondering how small can you make embedding before a model becomes noticeably worse, given constant parameter count.

The complaint was that the post has a lot of basic inconsistencies which is a problem regardless.

If your content is as bad as AI slop it doesn't really matter if it is or not, but I think it's safe to assume that when a verbose and grandiose post is internally inconsistent and was written after 2022, it's slop[0]

0 https://pyxis.nymag.com/v1/imgs/f0e/0bb/d9346e02d8d7173a6a9d...

What is "AI slop?" I could have a 2 hour long discussion with Claude and in the end have it formalize it as an article. Is that AI slop?

Yes

Ok, so it just means "worded with AI", no relation to how much thinking went in

Anyone interested in a subject can, if they wish, ask an AI about it and get an answer. Your deep conversation with an AI is something fun and insightful only to yourself. You're welcome to do it, but don't pretend it has meaning to anyone else, because if they want to get the same insight on a topic they can do it themselves for the same amount of effort (none).

> It's a very helpful way of saying it shouldn't be bothered to be read.

Not really.

[deleted]

There's a big difference between the type of errors that humans make when they misunderstand a subject, and the type of errors an LLM makes. I'm not well enough versed in the field to know which type of errors are found in this paper, but it seems that people who are versed in this field feel that these errors are of the latter type.

It wouldn't kill for the op to say llm was used btw ... thus not putting readers into a difficult position. Nobody likes being played.

Agree. What writing is better for understanding geometric properties or information in high dimensional vector spaces + spherical codes?

There's a lot of beautiful writing on these topics on the "pure math" side, but it's hard to figure out what results are important for deep learning and to put them in a form that doesn't take too much of an investment in pure math.

I think the first chapter of [1] is a good introduction to general facts about high-dimensional stuff. I think this is where I first learned about "high-dimensional oranges" and so on.

For something more specifically about the problem of "packing data into a vector" in the context of deep learning, last year I wrote a blog post meant to give some exposition [2].

One really nice approach to this general subject is to think in terms of information theory. For example, take the fact that, for a fixed epsilon > 0, we can find exp(C d) vectors in R^d with all pairwise inner products smaller than epsilon in absolute value. (Here C is some constant depending on epsilon.) People usually find this surprising geometrically. But now, say you want to communicate a symbol by transmitting d numbers through a Gaussian channel. Information theory says that, on average, I should be able to use these d numbers to transmit C d nats of information. (C is called the channel capacity, and depends on the magnitude of the noise and e.g. the range of values I can transmit.) The statement that there exist exp(C d) vectors with small inner products is related to a certain simple protocol to transmit a symbol from an alphabet of size exp(C d) with small error rate. (I'm being quite informal with the constants C.)

[1] https://people.math.ethz.ch/~abandeira//BandeiraSingerStrohm... [2] https://cgad.ski/blog/when-numbers-are-bits.html

Can't wait to see your post on the topic.

In the graph you’re referencing, K = 2 never reaches C = 0.2. K = 3 only reaches C = 0.3 before getting cut off.

I’m not even sure why that would be a problem, because he’s projecting N basis vectors into K dimensions and C is some measure of the error this introduces into mapping points in the N-space onto the K-space, or something. I’m glad to be shown why this is inconsistent with the graph, but your argument doesn’t talk about this idea at all.

I was a little informal with my argument. It's not strictly true that we only see C = 0.2 when K = 2. I was reading what the graph says about the case when N is much greater than k. I'll try to clarify.

C is meant to be the smallest constant so that, for each (N, k, epsilon) with k > C epsilon^-2 log N and epsilon > 0, there exists some arrangement of N vectors in R^k with inner products smaller than epsilon in absolute value. For each (N, k), the author optimized epsilon and reported the empirical value k epsilon^2 / log N, which is the smallest value of C for which our condition holds restricted to the given values of (N, k). (Of course, this is my good-faith interpretation---the article introduces C in the context of a JL distortion bound, and it takes an extra step to turn that into a bound on inner products.)

It can be shown that C = 4 satisfies this condition, when log is the natural log. See [1], for example. Based on the graph, the article claims to do better: "for sufficiently large spaces," it says we can put C = 0.2. This would be a very significant improvement.

For k = 2, the graph shows that C will be lower than 0.2 for sufficiently large N. (The color scheme is confusing! The line for k = 2 is the one that starts at just under 0.8 when N = 1.) Already for k = 3, the graph doesn't give us reason to believe it will be lower than 0.2---you correctly observed it gets to around 0.3. For larger value of k, the graph doesn't seem to show what we can expect for large N: the curves go up, but do not come down. This is what I meant with my comment: the conclusion that C <= 0.2 as N -> infinity is only justified by the behavior at K = 2.

Now, do these results make sense? In the case k = 2, we're trying to put a large number (N) of vectors on the unit circle, and thinking about how small the maximum inner product (epsilon) between any pair of vectors can be. As N -> infinity, the vectors will be packed very tightly and the maximum inner product epsilon will come close to 1. Overall, C = k epsilon^2 / log N will become arbitrarily small. In fact, the same happens for every k.

So, just in connection to this graph, the article makes three mistakes:

1) The article's interpretation of its experiment is wrong: the graph alone doesn't show that C < 0.2 for "large spaces".

2) However, it should be obvious a priori that, for all values of k, the reported values C should converge to 0 for large N (albeit very slowly, at a rate of 1/log N).

3) Unfortunately, this doesn't tell us anything about the minimum value of k / log(N) for a given epsilon and k, and so it doesn't support the conclusion of the article.

The problem with this kind of LLM-driven article is that it gives uncareful work the _appearance_ of careful work but none of the other qualities that usually come with care.

[1] https://lmao.bearblog.dev/exponential-vectors/

If C is the same for all K, then can’t he extrapolate the trend from K = 2 (which you’re right, I did misread) to all other values of K? That seems to be his justification for the claim you’re arguing.

I do think some more rigor is needed here (especially wrt to whether his derivation for C is sensical, as you allude to, since it obviously can’t converge to 0) but I hope we can agree that we are well into the “amateur who has trouble presenting as such” territory rather than the “AI slop” territory.