I'm pretty interested in the theorem proving/scientific research aspect of this.
Do you think it's possible that some version of LLM technology could discover new physical theories (that are experimentally verifiable), like for example a new theory of quantum gravity, by exploring the mathematical space?
Edit: this is just incredibly exciting to think about. I'm not an "accelerationist" but the "singularity" has never felt closer...
My hunch is that LLMs are nowhere near intelligent enough to make brilliant conceptual leaps. At least not anytime soon.
Where I think AI models might prove useful is in those cases where the problem is well defined, where formal methods can be used to validate the correctness of (partial) solutions, and where the search space is so large that work towards a proof is based on "vibes" or intuition. Vibes can be trained through reinforcement learning.
Some computer assisted proofs are already hundreds of pages or gigabytes long. I think it's a pretty safe bet that really long and convoluted proofs that can only be verified by computers will become more common.
https://en.wikipedia.org/wiki/Computer-assisted_proof
They don't need to be intelligent to make conceptual leaps. DeepMind stuff just does a bunch of random RL experiments until it finds something that works.
I think the answer is almost certainly no, and is mostly unrelated to how smart LLMs can get. The issue is that any theory of quantum gravity would only be testable with equipment that is much, much more complex than what we have today. So even if the AI came up with some beautifully simple theory, testing that its predictions are correct is still not going to be feasible for a very long time.
Now, it is possible that it could come up with some theory that is radically different from current theories, where quantum gravity arises very naturally, and that fits all of the other predictions of of the current theories that we can measure - so we would have good reasons to believe the new theory and consider quantum gravity probably solved. But it's literally impossible to predict whether such a theory even exists, that is not mathematically equivalent to QM/QFT but still matches all confirmed predictions.
Additionally, nothing in AI tech so far predicts that current approaches should be any good at this type of task. The only tasks where AI has truly excelled at are extremely well defined problems where there is a huge but finite search space; and where partial solutions are easy to grade. Image recognition, game playing, text translation are the great successes of AI. And performance drops sharply with the uncertainty in the space, and with the difficulty of judging a partial solution.
Finding physical theories is nothing like any of these problems. The search space is literally infinite, partial solutions are almost impossible to judge, and even judging whether a complete solution is good or not is extremely difficult. Sure, you can check if it's mathematically coherent, but that tells you nothing about whether it describes the physical world correctly. And there are plenty of good physical theories that aren't fully formally proven, or weren't at the time they were invented - so mathematical rigour isn't even a very strong signal (e.g. Newton's infinitesimal calculus wasn't considerered sound until the 1900s or something, by which time his theories had long since been rewritten in other terms; the Dirac delta wasn't given a precise mathematical definition until much later than it's uses; and I think QFT still uses some iffy math even today).
> Sure, you can check if it's mathematically coherent, but that tells you nothing about whether it describes the physical world correctly.
This is a very good point I think a lot of people miss. (Including some who should know better.) Pontificating about speculative physics is all right for Aristotle but you need actual experiments to ground your results.
Current LLMs are optimized to produce output most resembling what a human would generate. Not surpass it.
The output most pleasing to a human, which is both better and worse.
Better, when we spot mistakes even if we couldn't create the work with the error. Think art: most of us can't draw hands, but we can spot when Stable Diffusion gets them wrong.
Worse also, because there are many things which are "common sense" and wrong, e.g. https://en.wikipedia.org/wiki/Category:Paradoxes_in_economic..., and we would collectively down-vote a perfectly accurate model of reality for violating our beliefs.
IIRC, there have been people doing similar things using something close to brute-force. Nothing of real significance has been found. A problem is that there are infinitely many physically and mathematically correct theorems that would add no practical value.