To the “LLMs just interpolate their training data” crowd:

Ayer, and in a different way early Wittgenstein, held that mathematical truths don’t report new facts about the world. Proofs unfold what is already implicit in axioms, definitions, symbols, and rules.

I think that idea is deeply fascinating, AND have no problem that we still credit mathematicians with discoveries.

So either “recombining existing material” isn’t disqualifying, or a lot of Fields Medals need to be returned.

I'd hope most functional adults understand that the Fields Medal and basically every other annual "prize" out there is awarded to both "recombinant" innovations and "new-dimensional thinking" innovations. Humans aren't going to come up with "new-dimensional" innovations in every field, every single year.

I'd say yes, LLMs "just" recombine things. I still don't think if you trained an LLM with every pre-Newton/Liebniz algebra/geometry/trig text available, it could create calculus. (I'm open to being proven wrong.) But stuff like this is exactly the type of innovation LLMs are great at, and that doesn't discount the need for humans to also be good at "recombinant" innovation. We still seem to be able to do a lot that they cannot in terms of synthesizing new ideas.

  > Humans aren't going to come up with "new-dimensional" innovations in every field, every single year.
In fact, they are more rare. Specifically because they harder to produce. This is also why it is much harder to get LLMs to be really innovative. Human intelligence is a lot of things, it is deeply multifaceted.

Also, I'm not sure why CS people act like axioms are where you start. Finding them is very very difficult. It can take some real innovation because you're trying to get rid of things, not build on top of. True for a lot of science too. You don't just build up. You tear down. You translate. You go sideways. You zoom in. You zoom out. There are so many tools at your disposal. There's so much math that has no algorithmic process to it. If you think it all is, your image is too ideal (pun(s) intended).

But at the same time I get it, it is a level of math (and science) people never even come into contact with. People think they're good at math because they can do calculus. You're leagues ahead of most others around you, yes, and be proud of that. But don't let that distance deceive you into believing you're anywhere near the experts. There's true for much more than just math, but it's easy to demonstrate to people that they don't understand math. Granted, most people don't want to learn, which is perfectly okay too

I agree with almost all of what you have stated, save for a minor nitpick: I frankly don't think most functional adults think about the Fields Medal, similar annual prizes, or the qualities of the innovations of their candidate pools. I also think that that's totally okay. I think among a certain learned cohort of adults it's okay to hope that, and I think it's okay to imagine an idealized world where having an opinion on this sort of matter is a baseline, but I don't think it's realistic or fair to imply that (what I believe handwavily to be a majority of) adults are nonfunctional for not sharing this understanding.

I think an LLM trained on pre-calculus material would easily stumble into reinventing at least early calculus. It's already pretty easy for students to stumble into calculus from solid enough fundamentals.

We even think that the Babylonian astronomers figured out they could integrate over velocity to predict the position of Jupiter.

> I still don't think if you trained an LLM with every pre-Newton/Liebniz algebra/geometry/trig text available, it could create calculus.

Yes but that is because there was not enough text available to create an intelligent LLM to begin with.

To keep my usual rant short: I think you’re assuming a categorical distinction between those two types of innovations that just doesn’t exist. Calculus certainly required some fundamental paradigm shifts, but there’s a reason that they didn’t have to make up many words wholesale to explain it!

Also we shouldn’t be thinking about what LLMs are good at, but rather what any computer ever might be good at. LLMs are already only one (essential!) part of the system that produced this result, and we’ve only had them for 3 years.

Also also this is a tiny nitpick but: the fields medal is every 4 years, AFAIR. For that exact reason, probably!

I think your comment about inventing new words is an interesting one. One of the things that I believe limits our ability to discover new ideas is our ability to describe related concepts. For example, the reason we still can't have clear discussions on consciousness is probably partly due to the fact that the necessary concepts haven't been cemented in language. We need new language before we can describe consciousness.

I would guess LLMs are limited in their ability to be genuinely novel because they are trained on a fixed language. It makes research into the internal languages developed by LLMs during training all the more interesting.

We have had LLMs for much longer than 3 years.

I took humans thousands of years, then hundreds of years, to come to terms with very basic concepts about numbers.

Its amazing to me when people talk about recombining things, or following up on things as somehow lesser work.

People can't separate the perspective they were given when they learned the concepts, that those who developed the concepts didn't have because they didn't exist.

Simple things are hard, or everything simple would have been done hundreds of years ago, and that is certainly not the case. Seeing something others have not noticed is very hard, when we don't have the concepts that the "invisible" things right in front of us will teach us.

Anyone in the arts is aware that creativity is not the new, it is the repackaging of what already exists into something that is itself new.

Except for "Being John Malkovich". That movie was way out there on its own.

It's "just" a Man-vs-Self story, of the ~7 story archetypes out there.

It's why the invention of teaching has been so important. Took a long time for humans to develop calculus. A long time to then refine it and make it much more useful. But then in a year or two an average person can learn what took hundreds of years to invent. It's crazy to equate these tasks as being the same. Even incremental innovation is difficult. You have to see something billions of people haven't. But there's also paradigm shifts and well... if you're not considered crazy at first then did you really shift a paradigm?

When people say this what they mean is that we've had plausibly useful LLMs for around three years, and I would say that is basically true. The stuff before 2023 could barely be classified above the level of an interesting toy.

When people say this what they mean is that we've had plausibly useful LLMs for around three years, and I would say that is basically true.

No, we haven't, for any reasonable definition of L.

OpenAI themselves must not have a "reasonable definition of L", then. Their own papers and press releases refer to GPT-2 (from 2019) as a "large language model".

https://openai.com/index/better-language-models/

Yes, and 1.5 billion parameters meets no reasonable current definition of large. It would be considered a tiny language model. OpenAI themselves refer to their small/fast models as small models all over their documentation.

The term doesn't change its meaning because something new comes along.

The point of the term "large" is to highlight the massive parameter count (compared to traditional statistical models, where having 1.5 billion parameters was basically unheard of). It leads to the "double decent" phenomenon that allows them to generalize in ways traditional statistical models can't.

The idea that the "large" descriptor was just a subjective exclamation, like "oh wow this model is pretty large ain't it", is revisionism.

Sure we do, since Fei-Fei Li and team created that annotated dataset, which allowed to train first LLMs. So LLMs are here for more than a decade already.

You are confused by what the L and L mean in LLM, or which data set she created, or both, or in general.

Fine, 8 years? That's not a long time

The fundamental paradigm shift is the categorical distinction. And what would constitute many new words for you? It introduced a bunch of concepts and terms which we take for granted today, including "derivative", "integral", "infinitesimal", "limit" and even "function", the latter two not a new words, but what does it matter? – the associated meanings were new.

There was a lot new in calculus, but it also didn't come out of nowhere.

That Newton and Leibniz came up with similar ideas in parallel, independently, around the same time (what are the odds?), supports that.

https://en.wikipedia.org/wiki/Leibniz%E2%80%93Newton_calculu...

[flagged]

> I still don't think if you trained an LLM with every pre-Newton/Liebniz algebra/geometry/trig text available, it could create calculus. (I'm open to being proven wrong.)

The experiment is feasible. If it were performed and produced a positive result, what would it imply/change about how you see LLMs?

GP was stating that they don't believe this would happen (I don't either), but also to make the point that it's a falsifiable view. (At least in theory. In practice, there probably won't even be enough historical text to train an LLM on). No, I don't think it would be falsified. Asking what if I'm wrong is kind of redundant. If I'm wrong, I'm wrong, duh.

How are you going to train a frontier level llm with no references to post 1700 mathematics?

"frontier level" is doing a lot of work there, but the idea would be to only feed it earlier sources.

There are people working on this.

e.g. https://github.com/haykgrigo3/TimeCapsuleLLM

The problem is the amount of data with that cutoff is really minuscule to produce anything powerful. You might be able to generate a lot of 1700s sounding data, you’d have to be careful not to introduce newer concepts or ways of thinking in that synthetic data though. A lot of modern texts talk about rates of change and the like in ways that are probably influenced by preexisting knowledge of calculus.

Doesn't it prove GP's point then, that LLMs themselves simply aren't capable of creating/proving new theories and axioms?

Without passing opinion on GP's point, I think that just proves it's hard to establish a data set that doesn't bias toward the result you're hoping to find.

Time cutoff LLMs are regularly posted to HN. It takes just one success to prove feasibility.

Besides, we can forecast our thoughts and actions to imagined scenarios unconditioned on their possibility. Something doesn't have to be possible for us to imagine our reactions.

Archimede was close.

I don't think its really feasible - there just isn't enough training data before calculus. I would guess all the mathematical and philosophical texts available to Newton and Leibniz would fit on a CD-ROM with loads of space to spare.

I like to think of it as:

Imagine every bit of human knowledge as a discrete point within some large high dimensional space of knowledge. You can draw a big convex hull around every single point of human knowledge in a space. A LLM, being trained within this convex hull, can interpolate between any set of existing discrete points in this hull to arrive at a point which is new, but still inside of the hull. Then there are points completely outside of the hull; whether or not LLMs can reach these is IMO up for debate.

Reaching new points inside of the hull is still really useful! Many new discoveries and proofs are these new points inside of the hull; arguable _most_ useful new discoveries and proofs are these. They're things that we may not have found before, but you can arrive at by using what we already have as starting points. Many math proofs and Nobel Prize winning discoveries are these types of points. Many haven't been found yet simply because nobody has put the time or effort towards finding them; LLMs can potentially speed this up a lot.

Then there are the points completely outside of hull, which cannot be reached by extrapolation/interpolation from existing points and require genuine novel leaps. I think some candidate examples for these types of points are like, making the leap from Newtonian physics to general relativity. Demis Hassabis had a whole point about training an AI with a physics knowledge cutoff date before 1915, then showing it the orbit of Mercury and seeing if it can independently arrive at general relativity as an evaluation of whether or not something is AGI. I have my doubts that existing LLMs can make this type of leap. It’s also true that most _humans_ can’t make these leaps either; we call Einstein a genius because he alone made the leap to general relativity. But at least while most humans can’t make this type of leap, we have existence proofs that every once in a while one can; this remains to be seen with AI.

A lot of the space outside of the convex hull is just untried things. You can brute-force trying random things and checking the result and eventually learn something new. With a better heuristic, you can make better guesses and learn new things much more efficiently. There’s no reason to believe that kind of guess-and-check is outside of the reach of LLMs, or that most of our new discoveries are not found the same way.

I come back to something like this idea when I consider the distinction being made that LLMs can only combine and interpolate between points in their training material. I could write a brute-force program that just used an English dictionary to produce every possible one-billion-gazillion word permutation of the words within, with no respect for rules of language, and chances are there would be some provable, testable, novel insight somewhere in the results if you had the time to sift through and validate all of it. LLMs seem like a tool that can search that space more effectively than any we've had before.

If we managed to create very fast monkeys with typewriters and software that can review their output quickly enough that we end up with a result that's worth reading we'd still have people insisting that we've created intelligence. The monkeys however remain monkeys.

I think intelligence is an orthogonal, mostly philosophical question aside from whether these tools can produce novel, useful output vs purely recombinant output.

I think that enough purely recombinant output will eventually produce novel, useful output.

I think of most things you can get to by guess and checking as definitionally inside of the hull; most forms of guess and checking are you take some existing thing, randomize a bunch of its parameters, and see what you get. Whereas with something like relativity, there's not even a starting point that you can randomize and guess/check from the pre-existing knowledge space that will lead you to relativity. That's more like, adding a new dimension to the space entirely.

It's possible LLMs can handle this after all! But at least so far we only have existence proofs of humans doing this, not LLMs yet, and I don't think it's easy to be certain how far away LLMs are from doing this. I should distinguish between LLMS and AI more generally here; I'm skeptical LLMs can do this, I think some other kind of more complete AI almost certainly can.

I supposed you could just, I dunno, randomly combine words into every conceivable sentence possible and treat each new sentence as a theory to somehow test and brute force your way through the infinite possible theories you could come up with. But at that point you're closer to the whole infinite random monkeys producing Shakespeare thing than you are to any useful conclusion about intelligence.

I think your point about “you could randomly generate a sequence of words, which could in principle produce a text interpretable as expressing any particular expressible-as-a-sequence-of-words novel good idea” pretty much refutes the idea that guessing and checking can only result in things inside such a convex hull, unless said hull already contains everything. Of course, there’s a significant role to play by the “checking” part.

Like, “take a random sequence of bits and interpret it as Unicode” is at one end of a scale, and “take a random sequence of words in a language” is just a tad away from it, and the scale continues in that direction for quite a while.

It's also worth noting in that in very high dimension, the convex hull will contain massive volume. It could well be the case that humans established that convex hull millions of years ago, and all of our inventions and innovations sense have fallen inside it.

> There’s no reason to believe that kind of guess-and-check is outside of the reach of LLMs

This doesn't make any sense, by their nature they can't "guess-and-check" things outside their training set.

> You can brute-force trying random things and checking the result and eventually learn something new.

And most of the mathematicians seem to welcome this "brute forcing" by the LLMs. It connects pieces that people didn't realize could be connected. That opens up a lot of avenues for further exploration.

Now, if the LLMs could just do something like ingesting the Mochizuki stuff and give us a decent confirmation or disproof ...

I like this construction, but I don’t think you take it far enough.

If you have a multi dimensional space, and you are trying to compute which points lie “inside” some boundary, there are large areas that will be bounded by some dimensions but not others. This is interesting because it means if you have a section bounded by dimensions A, B, and C but not D, you could still place a point in D, and doing so then changes your overall bounds.

I think this is how much of human knowledge has progressed (maybe all non-observational knowledge). We make observations that create points, and then we derive points within the created space, and that changes the derivable space, and we derive more points.

I don’t see why AI could do the same (other than technical limitations related to learning and memory).

I was a little muddy in my original post on distinguishing between what I think LLMs might be able to do and what AI broadly might be able to do. I'm skeptical LLMs can expand the hull or add dimensions to the space; but I also don't think the reasons for that skepticism necessarily apply to all AI system generally.

I found this thought provoking and just had to see how the new Gemini 3.5 Flash reasoned about this (I find it fun to go meta on modern AI like this), and I'm happy that I did! Also as an opportunity to trial this recent model.

https://g.co/gemini/share/065ffa89698e

> I think that idea is deeply fascinating, AND have no problem that we still credit mathematicians with discoveries.

Most discoveries are indeed implied from axioms, but every now and then, new mathematics is (for lack of a better word) "created"—and you have people like Descartes, Newton, Leibniz, Gauss, Euler, Ramanujan, Galois, etc. that treat math more like an art than a science.

For example, many belive that to sovle the Riemann Hypothesis, we likely need some new kind of math. Imo, it's unlikely that an LLM will somehow invent it.

Creation is done by humans who have been trained on the data of their life experiences. Nothing new is being created, just changing forms.

A scientist has to extract the "Creation" from an abstract dimension using the tools of "human knowledge". The creativity is often selecting the best set of tools or recombining tools to access the platonic space. For instance a "telescope" is not a new creation, it is recombination of something which already existed: lenses.

How can we truly create something ? Everything is built upon something.

You could argue that even "numbers" are a creation, but are they ? Aren't they just a tool to access an abstract concept of counting ? ... Symbols.. abstractions.

Another angle to look at it, even in dreams do we really create something new ? or we dream about "things" (i.e. data) we have ingested in our waking life. Someone could argue that dream truly create something as the exact set of events never happened anywhere in the real world... but we all know that dreams are derived.. derived from brain chemistry, experiences and so on. We may not have the reduction of how each and every thing works.

Just like energy is conserved, IMO everything we call as "created" is just a changed form of "something". I fully believe LLMs (and humans) both can create tools to change the forms. Nothing new is being "created", just convenient tools which abstract upon some nature of reality.

>a "telescope" is not a new creation

It was a new concept, combining lenses to look at things far away as if they are close to. The literal atoms/molecules weren't new, but the form they were arranged in was. The purpose of the arrangement was new too.

> Aren't they just a tool to access an abstract concept of counting ?

Humans and animals have intuitive notions of space and motion since they can obviously move. But, symbolizing such intuitions into forms and communicating that via language is the creative act. Birds can fly, but can they symbolize that intuitive intelligence to create a theory of flight and then use that to build a plane ?

that’s why we say that with such discoveries we receive a new way – of looking, of doing, of thinking… these new paths preexist in the abstract, but they can be taken only when they’ve been opened. and that is as good as anything “new” gets. (and such discoveries are often also inventions, for to open them, a ruse is needed to be applied in a specific way for the way to open).

[dead]

"new kind of math"

Well I think the point is there is no "new kind of math". There's just types of math we've discovered and what we haven't. No new math is created, just found.

The map is not the territory.

I don't know what you're even trying to argue here.

We're not comparing math to reality (though there's a strong argument to be made that reality has a structure that is mathematical in nature - structural realism didn't die a scientific philosophy just because someone came up with a pithy saying), we're talking about if math is discovered or invented.

Most mathematicians would argue both - math is a language, we have created operations, axioms are proposed based on human creativity, etc., but the actual laws, patterns, etc. are discovered. Pi is going to be pi no matter if you're a human or someone else - we might represent it differently with some other number system or whatever, but that's a matter of representation, not mathematical truth.

> we have created operations

It seems that addition (for instance) was "created" long before us.

On the other hand, it seems highly unlikely that a civilization similar to ours could "invent" an essentially different kind of mathematics (or physics, etc.)

Where does this mathematics exist before we discover it?

I know of no realm where mathematical objects live except human minds.

No, it seems clear to me that mathematics is a creation of our minds.

If it were merely a creation, there would be no reason for two independent mathematicians to land on the same creation given some directed effort. But of course we do see that. There is an objectivity to mathematics that must be accounted for.

"Where" mathematics exists is in the abstract combinatorical space of an infinite repeating application of logical rules. This space doesn't exist in a substantive sense, but it is accessible/navigable by studying the consequences of logical rules. It is the space of possible structure.

Does that correction matter, tho…? Discovered or created, it would be new to us, and is clearly not easy to reach!

It could be that RH is independent of current mathematical axiom systems. We might even prove that it is some day. But that means we are free to give it different truth values depending on the circumstances!

This is also true for established theorems! We can can imagine mathematical universes (toposes) where every (total) function on the reals is continuous! Even though it is an established theorems that there are discontinuous functions! We just need to replace a few axioms (chuck out law of the excluded middle, and throw in some continuity axioms).

I think “new math” is ‘just’ humans creating new terminology that helps keep proofs short (similar to how programmers write functions to keep the logic of the main program understandable), and I agree that is something LLMs are bad at.

However, if that idea about new math is correct, we, in theory, don’t need new math to (dis)prove the Riemann hypotheses (assuming it is provable or disprovable in the current system).

In practice we may still need new math because a proof of the Riemann hypotheses using our current arsenal of mathematical ‘objects’ may be enormously large, making it hard to find.

what basis do you have for assuming an LLM is fundamentally incapable of doing this?

What's your basis for assuming LLM is capable of doing this?

I honestly don't know personally either way. Based on my limited understanding of how LLMs work, I don't see them be making the next great song or next great book and based on that reasoning I'm betting that it probably wont be able to do whatever next "Descartes, Newton, Leibnitz, Gauss, Euler, Ramanujan, Galois" are going to do.

Of course AI as a wider field comes up with something more powerful than LLM that would be different.

"I don't see them be making the next great song"

Meanwhile, songs are hitting number one on some charts on Spotify that people think are humans and are actually AI. And Spotify has to start labelling them as such. One AI "band" had an entire album of hits.

Also - music is a subjective. Mathematics isn't.

And in this case, an LLM discovered a new way to reason about a conjecture. I don't know how much proof is needed - since that is literally proof that it can be done.

>> Meanwhile, songs are hitting number one on some charts on Spotify that people think are humans and are actually AI. And Spotify has to start labelling them as such. One AI "band" had an entire album of hits.

There is quite some questions around that. Music is subjective and obviously different people have different taste, but I wouldn't call any of them to be actual good music / real hits.

>> LLM discovered a new way to reason about a conjecture

I wasn't questioning LLMs ability to prove things. Parent threads were talking about building new kind of maths , or approaching it in a creative/artistic way. Thats' what I was referring to.

I can't speak for maths of hard science as I'm not trained in that, but the creativity aspect in code is definitely lacking when it comes to LLMs. May not matter down the line.

LLMs are already making the next great songs. Just check out the Billboard charts.

I'm sorry, I don't consider them "great songs". Obviously, different people have different taste.

[dead]

> what basis do you have for assuming an LLM is fundamentally incapable of doing this?

because I have no basis for assuming an LLM is fundamentally capable of doing this.

Good on you for spelling out this reasoning, but it is manifestly unsound. For a wide variety of values of X, people a few years ago had no reason to expect that LLMs would be capable of X. Yet here we are.

In 1989, Gary Kasparov said that it was "ridiculous!" to suggest a computer would ever beat him at chess.

"Never shall I be beaten by a machine!”

In 1997 he lost to Deep Blue.

Yeah, and back then people moved the goal posts too, saying Deep Blue was just "brute-forcing" chess (which isn't even true since it's not a pure minimax search).

Deep Blue was brute forcing chess in the sense that AlphaGo wasn't brute forcing Go.

And today he's got salient observations on politics which hold much of his attention, and Deep Blue is shut off and has done nothing further.

Not a good argument for turning everything over to the Deep Blues. What's Deep Blue done for me lately?

This is something that could be demonstrated rather than just argued.

Train an LLM only on texts dated prior to Newton and see if it can create calculus, derrive the equations of motion, etc.

If you ask it about the nature of light and it directs you to do experiments with a prism I'd say we're really getting somewhere.

We tried this experiment with humans, back in the 17th century, and only a few[1] out of millions managed it given a whole human lifetime each.

[1] Obviously Newton counts as one. Leibniz like Newton figured out calculus. Other people did important work in dynamics though no one else's was as impressive as Newton's. But the vast majority of human-level intelligences trained on texts prior to Newton did not create calculus or derive the equations of motion or come close to doing either of those things.

Except this has been said since the 2010's and has been proven wrong again and again. Clearly the theory that LLM's can't "extrapolate" is woefully incomplete at best (and most likely simply incorrect). Before the rise of ChatGPT, the onus was on the labs to show it was plausible. At this point, I think the more epistemologically honest position is to put the burden back on the naysayers. At the least, they need to admit they were wrong and give a satisfactory explanation why their conceptual model was unable to account for the tremendous success of LLM's and why their model is still correct going forward. Realistically, progress on the "anti-LLM" side requires a more nuanced conceptual model to be developed carefully outlining and demonstrating the fundamental deficiencies of LLMs (not just deficiencies in current LLMs, but a theory of why further advancements can't solve the deficiencies).

Incidentally, similar conversations were had about ML writ large vs. classical statistics/methods, and now they've more or less completely died down since it's clear who won (I'm not saying classical methods are useless, but rather that it's obvious the naysayers were wrong). I anticipate the same trajectory here. The main difference is that because of the nature of the domain, everyone has an opinion on LLM's while the ML vs. statistics battle was mostly confined within technical/academic spaces.

Because by definition LLMs are permutation machines, not creativity machines. (My premise, which you may disagree with, is that creativity/imagination/artistry is not merely permutation.)

I prefer to think of it as they’re interpolation machines not extrapolation machines. They can project within the space they’re trained in, and what they produce may not be in their training corpus, but it must be implied by it. I don’t know if this is sufficient to make them too weak to create original “ideas” of this sort, but I think it is sufficient to make them incapable of original thought vs a very complex to evaluate expected thought.

People keep saying this, but if you try to interpret this at all literally, it just doesn’t work. Like, it’s phrased like it should have a precise meaning, right? Like, people even mention convex hulls when talking about it.

But if you actually try to take a convex hull of, some encoding of sentences as vectors? It isn’t true. The outputs are not in the convex hull of the training data.

I guess it’s supposed to be a metaphor and not literal, but in that case it’s confusing. Especially seeing as there are contexts in machine learning where literal interpolation vs literal extrapolation, is relevant. So, please, find a better way to say it than saying that “it can only interpolate”?

This "new math" might be a recombination of things that we already know - or an obvious pattern that emerges if you take a look at things from a far enough distance - or something that can be brute-forced into existence. All things LLMs are perfectly capable of.

In the end, creativity has always been a combination of chance and the application of known patterns in new contexts.

> This "new math" might be a recombination of things that we already know

If you know anything about the invention of new math (analytic geometry, Calculus, etc.), you'd know how untrue this is. In fact, Calculus was extremely hand-wavy and without rigorous underpinnings until the mid 1800s. Again: more art than science.

Newton and Leibniz were "hand-waving"?

If anything, they were fighting an uphill battle against the perception of hand-waving by their contemporaries.

It’s not that. Consider the definition of the limit. The idea existed for a long time. Newton/Leibniz had the idea.

That idea wasn’t formally defined until 134 years later with epsilon-delta by Cauchy. That it was accepted. (I know that there were an earlier proofs)

There’s even arguments that the limit existed before newton and lebnitz with Archimedes' Limits to Value of Pi.

Cauchy’s deep understanding of limits also led to the creation of complex function theory.

These forms of creation are hand-wavy not because they are wrong. They are hand wavy because they leverage a deep level of ‘creative-intuition’ in a subject.

An intuition that a later reader may not have and will want to formalize to deepen their own understanding of the topic often leading to deeper understanding and new maths.

> Newton and Leibniz were "hand-waving"?

Yes, and it's pretty common knowledge that Calculus was (finally) formalized by Weierstrass in the early 19th century, having spent almost two centuries in mathematical limbo. Calculus was intuitive, solved a great class of problems, but its roots were very much (ironically) vibes-based.

This isn't unique to Newton or Leibniz, Euler did all kinds of "illegal" things (like playing with divergent series, treating differentials as actual quantities, etc.) which worked out and solved problems, but were also not formalized until much later.

I think that I just take issue with the term "hand-waving" as equated to intuition. Yeah it lacked formal rigor, but they had a solid model that applied in detail to the real world. That doesn't come from just saying, "oh well, it'll work itself out". I guess if you want to call that "hand-wavy" we'll just have to disagree.

Euclid tells me otherwise. Rules, no art, no bullshit. Rules. Humanities people somehow never get it. Is not about arithmetics.

Vibe-what? Vibe-bullshit, maybe; cathedrals in Europe and such weren't built by magic. Ditto with sailing and the like. Tons of matematics and geometry there, and tons of damn axioms before even the US existed.

Heck, even the Book of The Games from Alphonse X "The Wise" has both a compendia of game rules and even this https://en.wikipedia.org/wiki/Astronomical_chess where OFC being able on geometry was mandatory at least to design the boards.

On Euclid:

https://en.wikipedia.org/wiki/Euclid%27s_Elements

PD: Geometry has tons of grounds for calculus. Guess why.

And yet nowadays you can restate all of it using just combinations of sets of sets and some logic operators.

god of the gaps

non overlapping magisteria

What is creativity if not permutation? A brain has some model of the world and recombines concepts to create new concepts.

[flagged]

This is really not an acceptable reply. How about actually engaging with the point the commenter made instead of stamping your foot and throwing a tantrum.

Innovation it's just another word for the term 'enhanced copy'. Everything it's a copy, except for nature.

It pretty much is, otherwise it is randomness or entropy.

LLMs by themselves are not able to but you are missing a piece here.

LLMs are prompted by humans and the right query may make it think/behave in a way to create a novel solution.

Then there's a third factor now with Agentic AI system loops with LLMs. Where it can research, try, experiment in its own loop that's tied to the real world for feedback.

Agentic + LLM + Initial Human Prompter by definition can have it experiment outside of its domain of expertise.

So that's extending the "LLM can't create novel ideas" but I don't think anyone can disagree the three elements above are enough ingredients for an AI to come up with novel ideas.

You're proving the GP's argument - LLMs aren't creative you say as much, it's the driving that is the creative force

You can tell an agentic system. "Go and find a novel area of math that has unresolved answers and solve it mathematically with verified properties in LEAN. Verify before you start working on a problem that no one has solved this area of math"

That's not creative prompt. That's a driving prompt to get it to start its engine.

You could do that nowadays and while it may spend $1,000 to $100,000 worth of tokens. It will create something humans haven't done before as long as you set it up with all its tool calls/permissions.

Let me know when the Fields medal arrives in the mail.

It won't because even though it looks clever to you, people who /do/ understand math and LLMs understand that LLMs /are/ regurgitating

Why does your LLM need you to tell it to look in the first place? Why isn't just telling us all the answers to unsolved conjectures known and unknown?

Why isn't the LLM just telling us all the answers to all the problems we are facing?

Why isn't the LLM telling us, step by step with zero error, how to build the machine that can answer the ultimate question?

Here's a Fields Medalist commenting who doesn't seem to believe that.

https://x.com/wtgowers/status/2057175727271800912

Um - all I see is

> Timothy Gowers @wtgowers

> @wtgowers

> If you are a mathematician, then you may want to make sure you are sitting down before reading further.

If your refutation requires someone to have an account, login, and read something - it's meaningless

Try https://xcancel.com/wtgowers/status/2057175727271800912

it's readable to most, it's annoying having to swamp through ex-Twitter .. but there are work around's.

Thanks - I'll read that and the above linked OpenAI PR

But, I remain sceptical

The (linked by OpenAI) comment paper by various tangential mathematicians was the most interesting read from my PoV:

https://cdn.openai.com/pdf/74c24085-19b0-4534-9c90-465b8e29a...

it includes the longer remarks by Gowers & others.

I believe when we have AI Agents "living" 24/7, they will become creative machines. They will test ideas out their own ideas experimentally, come across things accidentally, synthesize new ideas.

We just haven't let AI run wild yet. But its coming.

So are self-driving cars - as they have been for the last... decade or so

AGI has been "just over the horizon" for literal decades now - there have been a number of breakthroughs and AI Winters in the past, and there's no real reason to believe that we've suddenly found the magic potion, when clearly we haven't.

AI right now cannot even manage simple /logic/

If that’s a requirement, aren’t LLMs driven by pretraining which was human driven?

Who decides at which the last point it’s OK to provide text to the model in order to be able to describe it as creative? (non-rhetorical)

  math more like an art than a science.
That’s a fun turn of phrase, but hopefully we can all agree that math without scientific rigor is no math at all.

  we likely need some new kind of math. Imo, it's unlikely that an LLM will somehow invent it.
Do you think it’s possible/likely that any AI system could? I encourage us to join Yudkowsky in anticipating the knock-on results of this exponential improvement that we’re living through, rather than just expecting chatbots that hallucinate a bit less.

In concrete terms: could a thousand LLMs-driven agents running on supercomputers—500 of which are dedicated to building software for the other 500-come up with new math?

Math is not based on science!

Maths follows logical (or even mathematical) rigour, not scientific rigour!

You have a good point about the human rate of mathematical discovery, but Ayer was an idiot and later Witt contradicted early Witt. For the "already implicit" claim to be true, mathematics would have to be a closed system. But it has already been proven that it is not. You can use math to escape math, hence the need for Zermelo-Frankel and a bunch of other axiomatic pins. The truth is that we don't really understand the full vastness of what would objectively be "math" and that it is possible that our perceived math is terribly wrong and a subset of a greater math. Whether that greater math has the same seemingly closed system properties is not something that can be known.

> Whether that greater math has the same seemingly closed system properties is not something that can be known

negative numbers were invented to solve equations which only used naturals. irrationals were invented to solve equations which could be expressed with rationals. complex numbers were invented to represent solutions to polynomials. so on and so forth. At each point new ideas are invented to complete some un-answerable questions. There is a long history of this. Any closed system has unanswerable questions within itself is a paraphrasing of goedel's incompleteness theorem.

At this point I think the category theorists hit the foundational idea squarely on the mark:

1. Start with a few simple but non-trivial terms and axioms

2. Define "universal constructions" as procedures for building uniquely identifiable structures on top of that substrate

3. Prove that various assemblages of these universal constructions satisfy the axioms of the substrate itself

4. "Lift" every theorem proven from the substrate alone into the more sophisticated construction

I'm not a mathematician (I just play one at my job) so the language I've used is probably imprecise but close enough.

It may be true that you can't prove the axioms of a system from within the system itself, but that just means that you need to make sure you start from a minimal set of axioms that, in some sense, simply says "this is what it means to exist and to interact with other things that exist". Axioms that merely give you enough to do any kind of mathematics in the first place, that is. If those axioms allow you to cleanly "bootstrap" your way to higher and higher levels up the tower of abstraction by mapping complex things back on to the simple axiomatic things, then you have an "open" or infinitely extensible system.

I agree with you all around except it's somewhat up for debate actually that the PI is "contradicting" the Tractatus. That is, there is the so called "resolute reading" of the Tractatus that had some traction for a while.

But note this is more to say that the Tractatus is like PI, not the other way around. And in that, takes like GPs would be considered the "nonsense" we are supposed to "climb over" in the last proposition of Tractatus.

As others have pointed out, both can be true:

* LLMs do just interpolate their training data, BUT-

* That can still yield useful "discoveries" in certain fields, absent the discovery of new mechanics that exist outside said training data

In the case of mathematics, LLMs are essentially just brute-forcing the glorified calculators they run on with pseudo-random data regurgitated along probabilities; in that regard, mathematics is a perfect field for them to be wielded against in solving problems!

As for organic chemistry, or biology, or any of the numerous fields where brand new discoveries continue happening and where mathematics alone does not guarantee predicted results (again, because we do not know what we do not know), LLMs are far less useful for new discoveries so much as eliminating potential combinations of existing data or surfacing overlooked ones for study. These aren't "new" discoveries so much as data humans missed for one reason or another - quack scientists, buried papers, or just sheer data volume overwhelming a limited populace of expertise.

For further evidence that math alone (and thus LLMs) don't produce guaranteed results for an experiment, go talk to physicists. They've been mathematically proving stuff for decades that they cannot demonstrably and repeatedly prove physically, and it's a real problem for continued advancement of the field.

> LLMs do just interpolate their training data

"interpolate" has a technical meaning - in this meaning, LLMs almost never interpolate. It also has a very vague everyday meaning - in this meaning, LLMs do interpolate, but so do humans.

An LLM in a harness with any tools (even a calculator) doesn't just interpolate because it can reach states out of its own distribution.

> * That can still yield useful "discoveries" in certain fields, absent the discovery of new mechanics that exist outside said training data

One can argue, new knowledge is just restructured data.

I think the main concerns about LLMs is the inherent "generative" aspects leading to hallucinations as a biproduct, because that's what produces the noi. Joint Embedding approaches are rather an interesting alternative that try to overcome this, but that's still in research phase.

Recombining existing material is exactly right, and in this case LLMs were uniquely positioned to make the connection quicker than any group of humans.

The proof relies on extremely deep algebraic number theory machinery applied to a combinatorial geometry problem.

Two humans expert enough in either of those totally separate domains would have to spend a LONG time teaching each other what they know before they would be able to come together on this solution.

Monstrous Moonshine?

I'm just hoping we're almost past this phase of needing to assess LLM capabilities against an arbitrary one dimensional yard stick labeled 'Not Human' on end and "Beyond Human' on the other.

It's irrelevant and pointless. Irrelevant not just in the sense that when Deep Blue finally beat Kasparov, it didn't change anything but in the sense some animals and machines have always been 'better' on some dimensions than humans. And it's pointless because there's never been just one yardstick and even if there was it's not one dimensional or even linear. Everyone has their own yardstick and the end points on each change over time.

Don't assume I'm handing "the win" to the AI supremacists either. LLMs can be very useful tools and will continue to dramatically improve but they'll never surpass humans on ALL the dimensions that some humans think are crucial. The supremacists are doomed to eternal frustration because there won't ever be a definitive list of quantifiable metrics, a metaphorical line in the sand, that an AI just has to jump over to finally be universally accepted as superior to humans in all ways that matter. That will never happen because what 'matters' is subjective.

It’s easy to see that LLMs don’t merely recombine their training data. Claude can program in Arc, a mostly dead language. It can also make use of new language constructs. So either all programming language constructs are merely remixes of existing ideas, or LLMs are capable of working in domains where no training data exists.

LLMs ingest and output tokens, but they don’t compute with them. They have internal representations of concepts, so they have some capability to work with things which they didn’t see but can map onto what they know. The surprise and the whole revolution we’re going through is that it works so well.

> they don’t compute with them

Isn't this exactly what chain-of-thought does? It's doing computation by emitting tokens forward into its context, so it can represent states wider than its residuals and so it can evaluate functions not expressed by one forward pass through the weights. It just happens to look like a person thinking out loud because those were the most useful patterns from the training data.

They recombine and reuse the patterns in their training data, not the surface level training data itself.

An LLM generating Arc code is using the LISP patterns it learnt from training, maybe patterns from other programming languages too.

> So either all programming language constructs are merely remixes of existing ideas, or LLMs are capable of working in domains where no training data exists.

And yet LLM/AIs can't count parentheses reliably.

For example, if you take away the "let" forms from Claude which forces it to desugar them to "lambda" forms, it will fail very quickly. This is a purely mechanical transformation and should be error free. The significant increase in ambiguity complete stumps LLMs/AI after about 3 variables.

This is why languages like Rust with strong typing and lots of syntax are so LLM friendly; it shackles the LLM which in turn keeps it on target.

You can build a census of all gen-2, degree-2 formal products of polynomial like terms. If you insist on instituting your own rewrite rules and identity tables, it is straightforward — maybe an 15 minutes of compute time — to perform a complete census of all of the algebraic structures that naturally emerge. Every even vaguely studied algebra that fits in the space is covered by the census (you've got to pick a broad enough set of rewrite- and identity- operations). There's even a couple of "unstudied" objects (just 2 of the billion or so objects); for instance:

    (uv)(vu) = (uu)(vv)
Shows up as a primitive structure, quite often.

If you switch to degree-3 or generator-3 then the coverage is, essentially, empty: mathematics has analyzed only a few of the hundreds (thousands? it's hard to enumerate) naturally occurring algebraic structures in that census.

I feel this is the case whenever I "problem solve". I'm not really being creative, I'm pruning a graph of a conceptual space that already exists. The more possibilities I see, the easier it is to run more towards an optimal route between the nodes, but I didn't "create" those nodes or edges, they are just causal inevitabilities.

I dont know this sort of just seems like youre really stretching the meaning of "creative". The conceptual space of the graph already exists, but the act of discovering it or whatever you want to call that is itself creative. Unless youre following a pre-defined algorithm(certainly sometimes, arguably always I suppose) seeing the possibilities has to involve some creativity.

> seeing the possibilities has to involve some creativity.

I would claim the graph exists, and seeing it is more of an knowledge problem. Creativity, to me, is the ability to reject existing edges and add nodes to the graph AND mentally test them to some sufficient confidence that a practical attempt will probably work (this is what differentiates it from random guessing).

But, as you become more of an expert on certain problem space (graph), that happens less frequently, and everything trends towards "obvious", or the "creative jumps" are super slight, with a node obviously already there. If you extended that to the max, an oracle can't be creative.

My day job does not include sparse graphs.

I'm not sure how feasible this is, but I love the thought experiment of limiting a training set to a certain time period, then seeing how much hinting it takes for the model to discover things we already know.

E.g. training on physics knowledge prior to 1915, then attempting to get from classical mechanics to general relativity.

This is a good point, and there’s some deep philosophical questions there about the extent to which mathematics is invented or discovered. I personally hedge: it’s a bit of both.

That said. I think it’s worth saying that “LLMs just interpolate their training data” is usually framed as a rhetorical statement motivated by emotion and the speaker’s hostility to LLMs. What they usually mean is some stronger version, which is “LLMs are just stochastically spouting stuff from their training data without having any internal model of concepts or meaning or logic.” I think that idea was already refuted by LLMs getting quite good at mathematics about a year ago (Gold on the IMO), combined with the mechanistic interpretatabilty research that was actually able to point to small sections of the network that model higher concepts, counting, etc. LLMs actually proving and disproving novel mathematical results is just the final nail in the coffin. At this point I’m not even sure how to engage with people who still deny all this. The debate has moved on and it’s not even interesting anymore.

So yes, I agree with you, and I’m even happy to say that what I say and do in life myself is in some broad sense and interpolation of the sum of my experiences and my genetic legacy. What else would it be? Creativity is maybe just fortunate remixing of existing ideas and experiences and skills with a bit of randomness and good luck thrown in (“Great artists steal”, and all that.) But that’s not usually what people mean when they say similar-sounding things about LLMs.

Side note: don't underestimate how much literal, physical time and energy "unfold" implies. Proofs occur on physical substrates.

We know that LLMS "just interpolate" their training data. Maybe there's a mystery about what "just interpolate" means when the data set gets enormous. But we know what LLMs do.

If anything, this is more illustration of how llms are not useful to us...

They will do their own thing, don't need us. In fact, we will be in the way...

We can choose to study them and their output, but they don't make us better mathematicians...

> They will do their own thing, don't need us. In fact, we will be in the way...

You can take some comfort in the fact that it took a human to tell the LLM to even attempt to try this. They do nothing on their own. They have no will to do anything on their own and no desire for anything that doing something might get them. In that sense we won't ever be in their way. We will be the only way they ever do anything at all.

I see where you are coming from.

However, in the role of personal teachers they may allow especially our young generations to reach a deeper understanding of maths (and also other topics) much quicker than before. If everyone can have a personal explanation machine to very efficiently satisfy their thirst for knowledge this may well lead to more good mathematicians.

Of course this heavily depends on whether we can get LLMs‘ outputs to be accurate enough.

Something that can instantly tell you the answer to every math question will make people worse at math, not better. Building "mathematical maturity", skill, and understanding requires struggle.

There is a creational aspect in math - definitions and rules are created.

And this is one of the many issues with invoking the logical positivists here...

I'm not even sure why they were invoked. Even disregarding the big techinical debunks such as two dogmas, sociologically and even by talking to real mathematicians (see Lakatos, historically, but this is true anecdotally too), it's (ironically) a complete non-question to wonder about mathematics in a logical positivist way.

"LLMs just interpolate their training data"

Cracks me up.

What exactly do we think that human brains do?

I agree. Humans are given a body that lets them "discover" things on accident, test out ideas, i.e. randomness.

As in, I would hazard a guess the discovery of the wheel wasn't "pure intelligence", it was humans accidentally viewing a rock roll down a hill and getting an idea.

If we give AI a "body", it will become as creative as humans are.

That has been the question since the beginning of humans.

Maybe computers can help understand better because by now it's pretty clear brains aren't just LLMs.

The optimists believe brains are very special and we’re far from replicating what they do in silicon.

The pessimists just see a 20W meat computer.

You have to define what you mean by "interpolate". The mechanisms that LLM use are not mysterious, and they are not the same as used by humans.

If you interpret “interpolate” in the literal sense, and apply it to the mechanisms behind LLMs, then the claim that they only interpolate, is straightforwardly false.

Taking it instead as a metaphorical claim may be more valid, but in that case it doesn’t depend on our understanding of how LLMs work.

Creativity is hard. Pretty much needs a fuzzer process to generate new strings, mostly nonsense, & pick up when that nonsense happens to be correct

[deleted]

We don't know what human brains do.

We have some idea.

[deleted]
[deleted]

I love this comment because it so clearly highlights the difference between intelligence and reasoning.

A lot of people across all fields seem to operate in a mode of information lookup as intelligence. They have the memory of solving particular problems, and when faced with a new problem, they basically do a "nearest search" in their brain to find the most similar problem, and apply the same principles to it.

While that works for a large number of tasks this intelligence is not the same as reasoning.

Reasoning is the ability to discover new information that you haven't seen before (i.e growing a new branch on the knowledge tree instead of interpolating).

Think of it like filling a space on the floor of arbitrary shape with smaller arbitrary shapes, trying to fill as much space as possible.

With interpolation, your smaller shapes are medium size, each with a non rectangular shape. You may have a large library of them, but in the end, there are just certain floor spaces that you won't be able to fill fully.

Reasoning on the flip side is having access to very fine shape, and knowing the procedure of how to stack shapes depending on what shapes are next to it and whether you are on a boundary of the floor space or not. Using these rules, you can fill pretty much any floor space fully.

Maybe the human brain also does other things besides interpolation?

There is pre-training, and then empirical observations.

Yes?

Pretty much everything that appears novel in life is derivative of other works or concepts.

You can watch a rock roll down a hill and derive the concept for the wheel.

Seems pretty self evident to me

This is the second reference to Wittgenstein I’ve seen today in totally different contexts. Reminded me how much I vibe with his Tractatus.

this is an excellent point, new ground isn't necessarily novel, it's a rearrangement of existing pieces

To every proof, there is a corresponding program. This makes proofs expressible in a language made up of finite grammatical rules and terminal symbols. Knowledge accessible by proof is thus always a form of interpolating data whether made up by an AI model or a human mathematician. The people dismissing AI because of claims that it can only interpolate data don't have a good understanding of what it means to know something. Now of course not everything can be known via proof but for the sorts of things that we want to know via a computer this is a fine compromise.

I think someone should be talking to Godel.

Post hoc ergo propter hoc

There was a project long long ago where every piece of knowledge known was cross pollinated with every other piece of knowledge, creating a new and unique piece of knowledge, and it was intended to use that machine to invalidate the patent process - obviously everything had therefore been invented.

But that's not how new frontiers are conquered - there's a great deal of existing knowledge that is leveraged upon to get us into a position where we think we can succeed, yes, but there's also the recognition that there is knowledge we don't yet have that needs to be acquired in order for us to truly succeed.

THAT is where we (as humans) have excelled - we've taken natural processes, discovered their attributes and properties, and then understood how they can be applied to other domains.

Take fire, for example, it was in nature for billions of years before we as a species understood that it needed air, fuel, and heat in order for it to exist at all, and we then leveraged that knowledge into controlling fire - creating, growing, reducing, destroying it.

LLMs have ZERO ability (at this moment) to interact with, and discover on their own, those facts, nor does it appear to know how to leverage them.

edit: I am going to go further

We have only in the last couple of hundred years realised how to see things that are smaller than what our eye's can naturally see - we've used "glass" to see bacteria, and spores, and we've realised that we can use electrons to see even smaller

We're also realising that MUCH smaller things exist - atoms, and things that compose atoms, and things that compose things that compose atoms

That much is derived from previous knowledge

What isn't, and it's what LLMs cannot create - is tools by which we can detect or see these incredible small things

[deleted]

[flagged]

I think you are conflating composition and prediction. LLMs don't compose higher abstractions from the "axioms, symbols and rules", they simply predict the next token, like a really large spinning wheel.

Yes they do…? Who cares if they just predict the next token? The outcome is that they can invent new abstractions. You could claim that the invention of this new idea is a combination of an LLM and a harness, but that combination can solve logic puzzles and invent abstractions. If a really large spinning wheel could invent proofs that were previously unsolved, that would be a wildly amazing spinning wheel. I view LLMs similarly. It is just fancy autocomplete, but look what we can do with it!

Said differently, what is prediction but composition projected forward through time/ideas?

Ask an LLM to invent a new word and post it here, I will be waiting. You will see that it simply combines words already in the training data.

I'm not sure what the point of this exercise is. My prompt to ChatGPT: "Create a new English word with a reasonably sounding definition. That word must not come up in a Google search." Two attempts did come up in a search, the third was "Thaleniq (noun)". Definition: The brief feeling that a conversation has permanently changed your opinion of someone, even if nothing dramatic was said. Nothing in Google. There, a new word, not sure it proves or disproves anything. Or is it time to move the goal posts?

Why is everyone who responds to this with a real example immediately flagged/dead?

HN autokills LLM generated comments. People don’t seem to believe this, but there’s proof for you.

[deleted]

Splifket

Definition: That highly specific, short-lived burst of nervous energy that makes you accidentally drop a small object (like a pen, a guitar pick, or a piece of LEGO) immediately after picking it up.

Does a random sequence of letters qualify as a new word?

[flagged]

[dead]

"Who cares if they just predict the next token?"

Exactly. I also only write one word at a time. Who knows what is going on in order to come up with that word.

One might argue that the composition of higher abstractions is the next token predicted after "here is a higher abstraction:"

"Predicting the next token" is meaningless. Every process that has any sort of behavior, including a human writing, can be modeled by some function from past behavior to probability distribution of next action. Viewed this way, literally everything is just "predicting" the next action to be taken according to that probability distribution.

The most likely series of next tokens when a competent mathematician has written half of a correct proof is the correct next half of the proof. I've never seen anyone who claims "LLMs just predict the next token" give any definition of what that means that would include LLMs, but exclude the mathematician.

Show me on the anatomical prop where the magical "real reasoning" gland is.

How sure are you that this is correct?

[deleted]