One thing I’ve wondered about is what the “gap” between current transformer-based LLMs and optimal sequence prediction looks like.
To clarify, current LLMs (without RLHF, etc.) have a very straightforward objective function during training, which is to minimize the cross-entropy of token prediction on the training data. If we assume that our training data is sampled from a population generated via a finite computable model, then Solomonoff induction achieves optimal sequence prediction.
Assuming we had an oracle that could perform SI (since it’s uncomputable), how different would conversations between GPT4 and SI be, given the same training data?
We know there would be at least a few notable differences. For example, we could give SI the first 100 digits of pi, and it would give us as many more digits as we wanted. Current transformer models cannot (directly) do this. We could also give SI a hash and ask for a string that hashes to that value. Clearly a lot of hard, formally-specified problems could be solved this way.
But how different would SI and GPT4 appear in response to everyday chit-chat? What if we ask the SI-based sequence predictor how to cure cancer? Is the “most probable” answer to that question, given its internet-scraped training data, an answer that humans find satisfying? Probably not, which is why AGI requires something beyond just optimal sequence prediction. It requires a really good objective function.
My first inclination for this human-oriented objective function is something like “maximize the probability of providing an answer that the user of the model finds satisfying”. But there is more than one user, so over which set of humans do we consider and with which aggregation (avg satisfaction, p99 satisfaction, etc.)?
So then I’m inclined to frame the problem in terms of well-being: “maximize aggregate human happiness over all time” or “minimize the maximum of human suffering over all time”. But each of these objective functions has notable flaws.
Karpathy seems to be hinting toward this in his post, but the selection of an overall optimal objective function for human purposes seems to be an incredibly difficult philosophical problem. There is no objective function I can think of for which I cannot also immediately think of flaws with it.
>But how different would SI and GPT4 appear in response to everyday chit-chat? What if we ask the SI-based sequence predictor how to cure cancer?
I suspect that a lot of LLM prompts that elicit useful capabilities out of imperfect sequence predictors like GPT-4 are in fact most likely to show up in the context of "prompting an LLM" rather than being likely to show up "in the wild".
As such, to predict the token after a prompt like that, an SI-based sequence predictor would want to predict the output of whatever language model was most likely to be prompted, conditional on the prompt/response pair making it into the training set.
If the answer to "what model was most likely to be prompted" was "the SI-based sequence predictor", then it needs to predict which of its own likely outputs are likely to make it into the training set, which requires it to have a probability distribution over its own output. I think the "did the model successfully predict the next token" reward function is underspecified in that case.
There are many cases like this where the behavior of the system in the limit of perfect performance at the objective is undesirable. Fortunately for us, we live in a finite universe and apply finite amounts of optimization power, and lots of things that are useless or malign in the limit are useful in the finite-but-potentially-quite-large regime.
> What if we ask the SI-based sequence predictor how to cure cancer? Is the “most probable” answer to that question, given its internet-scraped training data, an answer that humans find satisfying?
You defined your predictor as being able to minimize mathematical definitions following some unspecified algebra, why didn't you define it being able to run chemical and pharmacological simulations through some unspecified model too?
I don’t follow—what do you mean by unspecified algebra? Solomonoff induction is well-defined. I’m just asking how the responses of a chatbot using Solomonoff induction for sequence prediction would differ from those using a transformer model, given the same training data. I can specify mathematically if that makes it clearer…
Alternatively, you include information about the user of the model as part of the context to the inference query, so that the model can uniquely optimize its answer for that user.
Imagine if you could give a model "how you think" and your knowledge, experiences, and values as context, then it's "Explain Like I'm 5" on steroids. Both exciting and terrifying at the same time.
> Alternatively, you include information about the user of the model as part of the context to the inference query
That was sort of implicit in my first suggestion for an objective function, but do you really want the model to be optimal on a per-user basis? There’s a lot of bad people out there. That’s why I switched to an objective function that considers all of humanity’s needs together as a whole.
Objective Function: Optimize on a per-user basis. Constraints: Output generated must be considered legal in user's country.
Both things can co-exist without being in conflict of each other.
My (hot) take is I personally don't believe that any LLM that can fit on a single GPU is capable of significant harm. An LLM that fits on an 8xH100 system perhaps, but I am more concerned about other ways an individual could spend ~$300k with a conviction of harming others. Besides, looking up how to make napalm on Google and then actually doing it and using it to harm others doesn't make Google the one responsible imo.