What I meant was similar to the article. If I have bit vectors of length 1000 bits (that will be the embedding). Let's say that every concept I want to model corresponds to a set choice of 10 bits being all set to 1, and the remaining bits 0. (These vectors are quasiorthogonal.) Then I can easily store a sentence (bitvector) about 50 concepts, while there is relatively small chance of overlap, i.e. all of the concepts are decodable from that bitstring.

But it's quite similar to what the top comment is saying about spherical codes. I think my comment is also about using coding theory to represent concepts.

Other than that, I don't have any issue with dot product over bitvectors - it's just not very useful for the above.