Hacker News

olalonde 14 hours ago [ - ]

Would be cool to have a benchmark with actually unsolved math and science questions, although I suspect models are still quite a long way from that level.

gowld 12 hours ago [ - ]

Does folding a protein count? How about increasing performance at Go?

fc417fc802 2 hours ago [ - ]

"Optimize this extremely nontrivial algorithm" would work. But unless the provided solution is novel you can never be certain there wasn't leakage. And anyway at that point you're pretty obviously testing for superintelligence.

optimalsolver 39 minutes ago [ - ]

It's worth noting that neither of those were accomplished by LLMs.

14 hours ago [ - ]

[deleted]