The labs are spending hundreds of millions of dollars hiring people doing many fairly random (but economically valuable) tasks to collect this tacit knowledge for RL.

It works really well.

It ceases to become tacit as soon as it is collected.

Maybe this rephrase will help: the proposed solution is to render all knowledge explicit.

> It ceases to become tacit as soon as it is collected.

I'm not sure.

It it is collected via preferences then it isn't necessarily something that can be communicated (except in the LLM's latent space).

That still feels tacit to me.

To simplify that argument, the relationship between King and Queen in the Word2Vec latent space can be easily explicitly labelled.

But the relationship between Napoleon and Tsar Alexander I also exists and encodes much of the tacit knowledge about their relationship but isn't as easily labelled (eg, Google AI Mode says "Napoleon I and Tsar Alexander I had a volatile "bromance" that shifted from mutual admiration to deep animosity, acting as a defining conflict of the Napoleonic Wars".)

Word2Vec is a very simple model. In a more complex LLM that deeper knowledge can be queried by asking questions but you can never capture it all. Isn't that what "tacit knowledge" is?