A game like Go has a clearly defined objective (win the game or not). A network like you described can therefore be trained to give a score to each move. Point here is that assessing whether a given sentence sounds good to humans or not does not have a clearly defined objective, the only way we came up with so far is to ask real humans.