Hacker News

Yes, same for maths. As long as a true reward 'surface' can be optimized. Approximate rewards are similar to approximate and non admissible heuristics,search eventually misses true optimal states and favors wrong ones, with side effects in very large state spaces.