Hacker News

Alphago didn't have human feedback, but it did learn from humans before surpassing them. Specifically, it had a network to 'suggest good moves' that was trained on predicting moves from pro level human games.

The entire point of alpha zero was to eliminate this human influence, and go with pure reinforcement learning (i.e. zero human influence).

cherryteastain a year ago [ - ]

A game like Go has a clearly defined objective (win the game or not). A network like you described can therefore be trained to give a score to each move. Point here is that assessing whether a given sentence sounds good to humans or not does not have a clearly defined objective, the only way we came up with so far is to ask real humans.

esjeon a year ago [ - ]

AlphaGo is an optimization over a closed problem. Theoretically, computers could have always beat human in such problems. It's just that, without proper optimization, humans will die before the computer finishes its computation. Here, AlphaGo cuts down the computation time by smartly choosing the branches with the highest likelihood.

Unlike the above, open problems can't be solve by computing (in combinatorial senses). Even humans can only try, and LLMs do spew out something that would most likely work, not something inherently correct.