Hacker News

Real-time fine tuning would be one approach that probably helps with some things (improving performance at a task based on feedback) but is probably not well suited for others (remembering analogous situations, setting goals; it's not really clear how one fine-tunes a context window into persistence in an LLM). There's also the concern that right now we seem to need many, many more examples in training data than humans get for the machine to get passably good at similar tasks.

I would also say that I believe that long-term goal oriented behavior isn't something that's well represented in the training data. We have stories about it, sometimes, but there's a need to map self-state to these stories to learn anything about what we should do next from them.

I feel like LLMs are much smarter than we are in thinking "per symbol", but we have facilities for iteration and metacognition and saving state that let us have an advantage. I think that we need to find clever, minimal ways to build these "looping" contexts.