This is why misspellings and homophones are tells of human righting. LLMs strongly prefer word-level tokens, and word substitutions follow semantic similarity and not the more human auditory similarity.

Funny, I’ve been cracking[0] at this exact problem with a purpose-built model[1]:

0: https://huggingface.co/posts/omarkamali/593639295164067

1: https://omneitylabs.com/models/sawtone

Claude the other day wrote code where one of the bytes in the array was 0xO5.

That's zero ex oh (the letter) five

> righting.

> LLMs strongly prefer word-level tokens, and word substitutions follow semantic similarity and not the more human auditory similarity.

Is this an elaborate joke or your full-word misspelling of writing is both agreeing with your statement (word substitutions) and contradicting it (not semantic but only pronunciation similarity)

I don't see the contradiction, unless you believe that the grandparent comment was written by an LLM.

[deleted]