> Does programming have a clear reward function? A vague description from a business person isn't it. By the time someone (a programmer?) has written a reward function that is clear enough, how would that function look compared to a program?
Well, to give an example: the complexity class NP is all about problems that have quick and simple verification, but finding solutions for many problems is still famously hard.
So there are at least some domains where this model would be a step forward.
But in that case, finding the solution is hard and you generally don't try. Instead, you try to get fairly close, and it's more difficult to verify that you've done so.
No. Most instances of most NP hard problems are easy to find solutions for. (It's actually really hard to eg construct a hard instance for the knapsack problem. And SAT solvers also tend to be really fast in practice.)
And in any case, there are plenty of problems in NP that are not NP hard, too.
Yes, approximation is also an important aspect of many practical problems.
There's also lots of problems where you can easily specify one direction of processing, but it's hard to figure out how to undo that transformation. So you can get plenty of training data.
I have a very simple integer linear program and it is really waiting for the heat death of the universe.
No, running it as a linear program is still slow.
I'm talking about small n=50 taking tens of minutes for a trivial linear program. Obviously the actual linear program is much bigger and scales quadratically in size, but still. N=50 is nothing.
Yes, there are also instances of problems in NP that are hard to solve in practice.
But here again: solutions to your problem are easy to verify, so it might be interesting to let an AI have a go at solving it.