I agree, it´s mostly a silly whim taken too far. Too much time in my hands.

In particular the whole stack based thing looks questionable.

In fact the very first answer by Gemini proposed an APL-like encoding of the primitives for token saving, but when I started the implementation Claude Code pushed back on that, saying it would need to keep some sane semantics around the keywords to be able to understand the programs.

The very strict verification story seems more plausible, tracks with the rest of the comments here.

What has surprised me is that the language works at all, adding todo items to a web app written in a week old language felt a bit eery.

Next level:

Have the LLMs generate tests that measure the “ease of use” and “effectiveness” of coding agents using the language.

Then have them use these tests to get data for their language design process.

They should also smoke test their own “meta process” here. E.g. Write a toy language that should be obviously much worse for LLMs, and then verify that the effectiveness tests produce a result agreeing with that.

I await the blog post :)