Those are just dimensions of different things, and it’s usually pretty clear from context what is meant. Color space has 3 dimensions; or 4 with transparency; an image pixel has 6 dimensions (xy+RGBA) if we take its color into account, but only 2 spatial dimensions; if you think of an image as a function that maps continuous xy coordinates into continuous rgba coordinates, then you have an infinitely dimensional function space; embeddings have their own dimensions, but none of them relate to their position in text at hand, which is why text in this context said to be 1D and image said to be 2D.