Next level:
Have the LLMs generate tests that measure the “ease of use” and “effectiveness” of coding agents using the language.
Then have them use these tests to get data for their language design process.
They should also smoke test their own “meta process” here. E.g. Write a toy language that should be obviously much worse for LLMs, and then verify that the effectiveness tests produce a result agreeing with that.
I await the blog post :)
Ugh sounds like work, we are vibing here. Or can we also vibe-science? :)