In microgpt, there's no alignment. It's all pretraining (learning to predict the next token). But for production systems, models go through post-training, often with some sort of reinforcement learning which modifies the model so that it produces a different probability distribution over output tokens.
But the model "shape" and computation graph itself doesn't change as a result of post-training. All that changes is the weights in the matrices.
In microgpt, there's no alignment. It's all pretraining (learning to predict the next token). But for production systems, models go through post-training, often with some sort of reinforcement learning which modifies the model so that it produces a different probability distribution over output tokens.
But the model "shape" and computation graph itself doesn't change as a result of post-training. All that changes is the weights in the matrices.