Hacker News

> if you include the desired score in your prompt, the model will now strive to produce an answer that is consistent with that score

But you need a model to generate score from answer, and then fine-tune another model to generate answer conditioned on score. The first time the score is at the end and the second time at the beginning. It's how DecisionTransformer works too, it constructs a sequence of (reward, state, action) where reward conditions on the next action.

https://arxiv.org/pdf/2106.01345

By the same logic you could generate tags, including style, author, venue and date. Some will be extracted from the source document, the others produced with classifiers. Then you can flip the order and finetune a model that takes the tags before the answer. Then you got a LLM you can condition on author and style.