Hacker News

colechristensen 4 hours ago [ - ]

No, they're actually training weights based on context before compaction. Context is context, this is splitting the model into persistent weights and malleable ones which are periodically updated.

delis-thumbs-7e 4 hours ago [ - ]

Wouldn’t that be extremely computationaly expensive considering how resource incentive training is?

colechristensen 4 hours ago [ - ]

No, training a state of the art model involves training on the order of 10 trillion tokens.

We're talking about a step that updates weights based on say between 10k and 1M tokens.

delis-thumbs-7e 4 hours ago [ - ]

I learned something. Thank you!