It's quite sad that application interoperability requires parsing text passed via pipes instead of exchanging structured information.

Like others said, worse is better.

Your comment takes 630 bits, the screenshot of your comment on my computer takes 2.1 MB, about 218k times the size. Either this is a compute overhead the LLM has to do before it can think about the meaning of the text, or if it's a E2E feedforward architecture, less thinking about it. This is simple for us because neurons in the retina pre-process its stream so that less than 0.8 % is sent to the visual cortex and because we have evolved to very efficiently and quickly extract meaning from our vision. This is a prime example of the Moravec's paradox.