> LLMs are one large binary that does everything
Well, no, they aren't, but the orchestration frameworks in which they are embedded sometimes are (though a lot of times a whole lot of that everything is actually done by separate binaries the framework is made aware of via some configuration or discovery mechanism.)