Why are tokens not coloured? Would there just be too many params if we double the token count so the model could always tell input tokens from output tokens?

That's something I'm wondering as well. Not sure how it is with frontier models, but what you can see on Huggingface, the "standard" method to distinguish tokens still seems to be special delimiter tokens or even just formatting.

Are there technical reasons why you can't make the "source" of the token (system prompt, user prompt, model thinking output, model response output, tool call, tool result, etc) a part of the feature vector - or even treat it as a different "modality"?

Or is this already being done in larger models?

By the nature of the LLM architecture I think if you "colored" the input via tokens the model would about 85% "unlearn" the coloring anyhow. Which is to say, it's going to figure out that "test" in the two different colors is the same thing. It kind of has to, after all, you don't want to be talking about a "test" in your prompt and it be completely unable to connect that to the concept of "test" in its own replies. The coloring would end up as just another language in an already multi-language model. It might slightly help but I doubt it would be a solution to the problem. And possibly at an unacceptable loss of capability as it would burn some of its capacity on that "unlearning".

Instead of using just positional encodings, we absolutely should have speaker encodings added on top of tokens.

Because they're the main prompt injection vector, I think you'd want to distinguish tool results from user messages. By the time you go that far, you need colors for those two, plus system messages, plus thinking/responses. I have to think it's been tried and it just cost too much capability but it may be the best opportunity to improve at some point.

Because then the training data would have to be coloured

I think OpenAI and Anthropic probably have a lot of that lying around by now.

So most training data would be grey and a little bit coloured? Ok, that sounds plausible. But then maybe they tried and the current models get it already right 99.99% of the time, so observing any improvement is very hard.

They have a lot of data in the form: user input, LLM output. Then the model learns what the previous LLM models produced, with all their flaws. The core LLM premise is that it learns from all available human text.

This hasn't been the full story for years now. All SOTA models are strongly post-trained with reinforcement learning to improve performance on specific problems and interaction patterns.

The vast majority of this training data is generated synthetically.

This has the potential to improve things a lot, though there would still be a failure mode when the user quotes the model or the model (e.g. in thinking) quotes the user.

I’ve been curious about this too - obvious performance overhead to have a internal/external channel but might make training away this class of problems easier

you would have to train it three times for two colors.

each by itself, they with both interactions.

2!

The models are already massively over trained. Perhaps you could do something like initialise the 2 new token sets based on the shared data, then use existing chat logs to train it to understand the difference between input and output content? That's only a single extra phase.

You should be able to first train it on generic text once, then duplicate the input layer and fine-tune on conversation.