It is surprising that it is prompt based model and not RLHF.

I am not an LLM guy but as far as I understand, RLHF did a good job converting a base model into a chat model (instruct based), a chat/base model into a thinking model.

Both of these examples are about the nature of the response, and the content they use to fill the response. There are so many differnt ways still pending to see how these can be filled.

Generating an answer step by step and letting users dive into those steps is one of the ways, and RLHF (or the similar things which are used) seems a good fit for it.

Prompting feels like a temporary solution for it like how "think step by step" was first seen in prompts.

Also, doing RLHF/ post training to change these structures also make it moat/ and expensive. Only the AI labs can do it

The problem is you'd then have to do all the product-specific post training again once the new base model comes out a few months later. I think they'd rather just have general models that are trained to follow instructions well and can adapt to any kind of prompt/response pattern.