What i do not understand is why we are not seeing specialized models that go down to single experts.
I do not need models that know how to program in Python, Rust, ... when i only use Go and Html. So we are we not seeing models that have very specialized experts, where for instance:
* General interpreter model, that holds context/memory * Go Model * Html model if there is space in memory. * SQL model if there is space in memory.
If there is no space, the GIM swamp out the Go model, for the HTML model, depending on where it is in Agent tasks or Edit/Ask code its overviewing.
Because the models are going to be very small, switching in and out of memory will be ultra fast But most of the time we get very big Expert models, that still are very generalized over a entire field.
This can then be extended that if you have the memory, models combine their output with tasks... Maybe i am just too much of a noob in the field of understanding how LLMs work, but it feels like people are too often running after large models that companies like Anthropic/OpenAI etc deploy. I understand why those big companies use insane big models. They have the money to load them up over a cluster, have the fast interconnect, and for them its more efficient.
But from the bits and pieces that i see, people are more and more going to tons of small 1 a 2B models to produce better results. See my argument above. Like i said, never really gone beyond paying for my CoPilot subscription and running a bit of Ollama at home (don't have the time for the big stuff).
I think one of the issues is that LLMs can't have a "Go" model and an "HTML model". I mean, they can but what would that contain? It's not the language-specific features that make models large.
When models work on your code base, they do not "see" things like this, which is why they can go through an entire code base with variable names they have never seen before, function signatures they have never seen before, and directory structures that have never seen before and not have a problem.
You need that "this is a variable, which is being passed to a function which recursively does ..." part. This is not something language specific, it's the high level understanding of how languages and systems operate. A variable is a variable whether in JavaScript or C++ and LLMs can "see" it as such. The details are different but it's that layer of "this is a software interface", "this is a function pointer" is outside of the "Go" or "Python" or "C#" model.
I don't know how large the main model would have to be vs. the specialized models in order to pick this dynamic up.
You wont win much performance with a specific coding language tokenizer/vocabulary, everything else benefits from a larger model size. You can get distilled models that will out-perform or compete with your single domain coding model