How expensive is a "brute force" approach to decode it? I mean, how about mapping each unknown word by a known word in a known language and improve this mapping until a 'high score' is reached?

This seems to assume that a 1:1 mapping between words exists, but I don't think that's true for languages in general. Compound words, for example, won't map cleanly that way. Not to mention deeper semantic differences between languages due to differences in culture.

Correct

Mapping words 1:1 is not going to lead you anywhere (especially for a text that has stood undecoded for so long time)

It kiiiinda works for very close languages (think Dutch<>German or French<>Spanish) and even then.

That’s a really interesting question — and one I’ve been circling in the back of my head, honestly. I’m not a cryptographer, so I can’t speak to how feasible a brute-force approach is at scale, but the idea of mapping each Voynich “word” to a real word in another language and optimizing for coherence definitely lines up with some of the more experimental approaches people have tried.

The challenge (as I understand it) is that the vocabulary size is pretty massive — thousands of unique words — and the structure might not be 1:1 with how real language maps. Like, is a “word” in Voynich really a word? Or is it a chunk, or a stem with affixes, or something else entirely? That makes brute-forcing a direct mapping tricky.

That said… using cluster IDs instead of individual word (tokens) and scoring the outputs with something like a language model seems like a pretty compelling idea. I hadn’t thought of doing it that way. Definitely some room there for optimization or even evolutionary techniques. If nothing else, it could tell us something about how “language-like” the structure really is.

Might be worth exploring — thanks for tossing that out, hopefully someone with more awareness or knowledge in the space see's it!

Like I said in another post (sorry for repeating) since this was during 1500s, the main thing people would've been encrypting back then was biblical text (or any other religion).

Maybe a version of scripture that had been "rejected" by some King, and was illegal to reproduce? Take the best radiocarbon dating, figure out who was King back then, and if they 'sanctioned' any biblical translations, and then go to the version of the bible before that translation, and this will be what was perhaps illegal and needed to be encrypted. That's just one plausible story. Who knows, we might find out the phrase "young girl" was simplified to "virgin", and that would potentially be a big secret.

Is this grey cause it talks about religion? That stuff was bigger in 1500 than 2000, from that lense as religious text seems a reasonable track to follow.

Other than war plans, religious text was pretty much the only thing in the 1500s that would have been encrypted. However war plans would be very unlikely to be disguised as a botany book, for all kinds of reasons. War plans are temporary, not something you'd dedicate that level of artistic effort and permanence to.

The art of war by Sun Tzu is pretty timeless tho

Right, because it's not a war plan. A war plan is about when, where, how, who, etc, for specific attack(s).

yes indeed, more like a war blueprint? like in general strategies applicable to many battles (so you can infer plans for any n wars of the future)

idk

I mean it's theoretically possible a 1500s King might have made that book illegal, because of it's general knowledge. That's a legit point.

Sadly the radio carbon dating disproved two of my far out theories, which was, 1) The book survived from some earlier 'iteration' of life on the planet, where all plants were simply different. or 2) All planets form the same 'kind' of carbon-based life, and this book was sent/delivered to us by another planet.

Sadly, it's probably just someone's form of "art", and not even "real".

It might be a good idea for a SETI@home like project.

I don't think that's likely possible. How would you determine the score? Where would you get your corpus of medieval words? How would you deal with the insane computational complexity?

Pecularities in Voynich also suggest that one to one word mappings are very unlikely to result in well described languages. For instance there's cases of repeated word sequences you don't really see in regular text. There's a lack of extremely common words that you would expect would be neccessary for a word based structured grammar, there's signs that there's at least two 'languages', character distributions within words don't match any known language, etc.

If there still is a real unencoded language in here, it's likely to be entirely different from any known language.