Hacker News

> you CAN get good results for problems that can be reduced to a robust conformance suite.

If that's what is shown then why doesn't it work on anything that has a sufficiently large test-suite, presumably scaling linearly in time with size? Why should we be selective, and based on what?