Hacker News

I wasn’t talking about human reinforcement.

The discussion has been about CoT in LLMs, so I’ve been referring to the model in isolation from the start.

Here’s how I currently understand the structure of the thread (apologies if I’ve misread anything):

“Is CoT actually thinking?” (my earlier comment)

→ “Yes, it is thinking.”

  → “It might be thinking.”

   → “Under that analogy, self-training on its own CoT should work — but empirically it doesn’t.”

    → “Maybe it would work if you add external memory with human or automated filtering?”

Regarding external memory:

without an external supervisor, whatever gets written into that memory is still the model’s own self-generated output — which brings us back to the original problem.