https://en.wikipedia.org/wiki/Reinforcement_learning#Policy