John Carmack made this observation (cli-centred dev for agents) a year ago:

LLM assistants are going to be a good forcing function to make sure all app features are accessible from a textual interface as well as a gui. Yes, a strong enough AI can drive a gui, but it makes so much more sense to just make the gui a wrapper around a command line interface that an LLM can talk to directly.

https://x.com/ID_AA_Carmack/status/1874124927130886501

https://xcancel.com/ID_AA_Carmack/status/1874124927130886501

Andrej Karpathy reiterated it a couple of weeks ago:

CLIs are super exciting precisely because they are a "legacy" technology, which means AI agents can natively and easily use them, combine them, interact with them via the entire terminal toolkit.

https://x.com/karpathy/status/2026360908398862478

https://xcancel.com/karpathy/status/2026360908398862478

Thanks for sharing these contents. They are very interesting. I found "making all app features accessible from a textual interface..." actually quite challenging in cerntain domains such as graphics related editing tools. Though many editing functions can be exposed as CLI properly, but the content being edited is very hard to be converted into texts without losing its geometric meaning. Maybe this is where we truly need the multimodal models or where training on specialized data is needed.

> the content being edited is very hard to be converted into texts

For decades now, pro design print shops have required text files describing the design to print from.

And as every Danish pelican cyclist knows, graphics are their most scalable as text vectors.

Inkscape does fine with these.