I tried my own test recently:
"Write a history of the Greek language but reverse it, so that one would need to read it from right to left and bottom to top."
ChatGPT wrote the history and showed absolutely no awareness, let alone, "understanding" of the second half of the prompt.
With o3-mini-high (just the last paragraph):
civilization Mycenaean the of practices religious and economic, administrative the into insights invaluable provides and B Linear as known script the in recorded was language Greek the of form attested earliest The
Oh, interesting, what do you get when you specify that the letters need to be reversed, too? (That was what I meant and the original prompt explicitly stated that requirement. I forgot to include it in the summary of my 'test' here.)
As much I think AI is overhyped too, that is a prime use case that would be better solved by passing the text to a tool, rather than jam a complex transformations like that into its latent space.
A real intelligence would recognize that this task is better solved with an automated tool and actually do so. ChatGPT is capable of writing and executing Python code, but it doesn't occur to it to use that in cases like this.
Thanks, that was essentially the test. I've gotten into a number of disagreements with people on HN about whether LLMs are 'just' token predictors, whether they 'understand' (whatever we mean by that), whether there's a guiding intelligence, whether they're 'just' language calculators, etc.
As someone else in this thread nicely put it, the tools are being sold as a hop, skip, and jump away from AGI. They clearly aren't. ChatGPT tells us to "ask anything." I did that. There is no 'there' there with these tools. They aren't even dumb.
I have a similar test for image gens. I try to get them to write reversed text in condensation on windows. The new GPT is the best so far, it can sorta, maybe, do it sometimes. Others will sometimes reverse the letter order, but not flip each character.
Try playing a game of Hangman with ChatGPT. It's hilarious.
It does surprisingly well!
Edit: scratch that, it thought there was a six letter word starting with "trs" and then changed its mind to "tre" when I guessed "e." Hilarious.
Just copied your prompt and it handled it just fine.
?siht ekil kool rewsna eht diD
Edit: realized just now that my summary of the 'test' failed to specify the request fully: the letters need to be reversed, too. Maybe I'm just bad with AI tools, because I didn't even get a response that 'this like looked' (i.e. reversed the order of the words).
LLMs work with tokens, not letters. So that's not going to work.
It might work in an agent system where it can make and execute code to solve problems.
By the way, why _do_ llms work with tokens and not letters? Is that some kind of optimization, or is there a more fundamental reason for it?
I actually would love to see the output of an LLM that used letters or syllables or phonemes. The output when it makes mistakes would be absolutely wild.