It reminded me of "Text embeddings reveal almost as much as text" from 2023 (https://news.ycombinator.com/item?id=37867635) - and yes, they do cite it.

It has a huge implication for privacy. There is some "mental model" that embedding vectors are like hash - so you can store them in database, even though you would not store plain text.

It is an incorrect assumption - as a good embedding stores ALL - not just the general gist, but dates, names, passwords.

There is an easy fix to that - a random rotation; preserves all distances.

This mental model is also in direct contradiction to the whole purpose of the embedding, which is that the embedding describes the original text in a more interpretable form. If a piece of content in the original can be used for search, comparison etc., p much by definition it has to be stored in the embedding.

Similarly, this result can be rephrased as "Language Models process text." If the LLM wasn't invertible with regards to a piece of input text, it couldn't attend to this text either.

> There is an easy fix to that - a random rotation; preserves all distances.

Is that like homomorphic encryption, in a sense, where you can calculate the encryption of a function on the plaintext, without ever seeing the input or calculated function of plaintext.