If you imagine the latent space as a map and the prompt as a sequence of directions towards clusters of knowledge, it makes sense that alignment can cull "pathways" through the latent space that emerged during pretraining.
If you imagine the latent space as a map and the prompt as a sequence of directions towards clusters of knowledge, it makes sense that alignment can cull "pathways" through the latent space that emerged during pretraining.