> The breakthrough combines neural networks...

This is the important part. It's not guaranteed to be accurate. They claim it "delivers essentially the same correctness as the model it imitates -- sometimes even finer detail". But can you really trust that? Especially if each frame of the simulation derives from the previous one, then errors could compound.

It seems like a fantastic tool for quickly exploring hypotheses. But seems like once you find the result you want to publish, you'll still need the supercomputer to verify?

I don't know if it's the same thing, but it feels like an analogy:

Protein structure prediction is now considered to be "solved," but the way it was solved was not through physics applied to what is clearly a physics problem. Instead it was solved with lots of data, with protein language modeling, and with deep nets applied to contact maps (which are an old tool in the space), and some refinement at the end.

The end result is correct not because physics simulations are capable of doing the same thing and we could check Alphafold against it, but because we have thousands of solved crystal structures from decades of grueling lab work and electron density map reconstruction from thousands of people.

We still need that crystal structure to be sure of anything, but we can get really good first guesses with AlphaFold and the models that followed, and it has opened new avenues of research because a very very expensive certainty now has very very cheap mostly-right guesses.

When it comes to very complicated things, physics tends to fall down and we need to try non-physics modeling, and/or come up with non-physics abstraction.

Protein folding is in no way "solved". AlphaFold dramatically improved the state-of-the-art, and works very well for monomeric protein chains with structurally resolved nearest neighbors. It abjectly fails on the most interesting proteins - just go check out any of the industry's hottest undrugged targets (e.g. transcription factors)

> When it comes to very complicated things, physics tends to fall down and we need to try non-physics modeling, and/or come up with non-physics abstraction.

"When things are complicated, if I just dream that it is not complicated and solve another problem than the one I have, I find a great solution!"

Joking apart, models that can help target potentially very interesting sub phase space much smaller than the original one, are incredibly useful, but fundamental understanding of the underlying principles, allowing to make very educated guesses on what can and cannot be ignored, usually wins against throwing everything at the wall...

And as you are pointing out, when the complex reality comes knocking in it usually is much much messier...

I have your spherical cow standing on a frictionless surface right here, sir. If you act quickly, I can include the "spherical gaussian sphere" addon with it, at no extra cost.

It’s interesting that we do essentially the same thing in all of non-physics science.

Everything is nuclear physics in the end, but trying to solve problems in, say, economics or psychology by solving a vast number of subatomic equations is only theoretically possible. Even in most of physics we have to round up and make abstractions.

I have a thing where I immediately doubt any ML paper that imitates a process then claims that the model is sometimes “even better” than the original process. This almost always means that there is an overzealous experimenter or a PI who didn’t know what they were dealing with.

Hello, lead author here. First: you are right! A surrogate model is a fancy interpolator so, eventually, it will just be as good as the model it is trying to mimic, not more. The piece that probably got lost in translation is that the codes we are mimicking have some accuracy settings, which sometimes you can't push to maximum because of the computational cost. But with the kind of tools we are developing, we can push these settings when we are creating the training dataset (as this is cheaper than running the full analysis). In this way, the emulator might be more precise than the original code with "standard settings" (because it has been trained using more accurate settings). This claim of course needs check: if I am including an effect that might have a 0.1% on the final answer but the surrogate has an emulation error of order 1%, clearly the previous claim would not be true.

There are straightforward emulation settings in which a trained emulator can be more accurate than a single forward run, even when both training and "single forward run" use the same accuracy settings.

Suppose you emulate a forward model y = F(x), by choosing a design X = {x1, ..., xN}, and making a training set T = {(x1, y1), ..., (xN, yN)}.

With T, you train an emulator G. You want to know how good y0hat = G(x0) is compared to y0 = F(x).

If there is a stochastic element to the forward model F, there will be noise in all of the y's, including in the training set, but also including y0! (Hopefully your noise has expectation 0.)

(This would be the case for a forward model that uses any kind of Monte Carlo under the hood.)

In this case, because the trained G(x0) is averaging over (say) all the nearby x's, you can see variance reduction in y0hat compared to y0. This, for example, would apply in a very direct way to G's that are kernel methods.

I have observed this in real emulation problems. If you're pushing for high accuracy, it's not even rare to see.

More speculatively, one can imagine settings in which (deterministic) model error, when averaged out over nearby training samples in computing y0hat, can be smaller than the single-point model error affecting y0. (For example, there are some errors in a deterministic lookup table buried in the forward model, and averaging nearby runs of F causes the errors to decrease.)

I have seen this claim credibly made, but verifying it is hard -- the minute you find the model error that explains this[*], the model will be fixed and the problem will go away.

[*] E.g., with a plot of y0hat overlaid on y0, and the people who maintain the forward model say "do you have y0 and y0hat labeled correctly?"

That 'finer detail' sounds suspiciously like inventing significant digits from less significant inputs. You can interpolate, for sure, but it isn't going to add any information.

I'm not sure what you mean by that. Neutral networks are pretty good statistical learning tools, and in this kind of application you'll need some stochastic learning, regardless of using a laptop or a supercomputer. It's not like they used an LLM to predict the simulation steps. If you read the paper, they seem to use a simple fully-connected 5-layer neural network architecture, which is a completely different beast from, say, trillion parameters transformers used for LLMs.

It's an approximator, right? I don't know about Astronomy but there are obvious use cases where an approximate result is "good enough" and even better than a precise result if it's significantly cheaper (or faster!) to get the approximation than to calculate the precise result.

In cases like this I'm always thinking of Reimann integrals and how I remember feeling my face scrunching up in distaste when they were first explained to us in class. It took a while for me to feel comfortable with the whole idea. I'm a very uh discrete kind of person.

As an aside, I consider the kind of work described in the article where a classic, symbolic system is essentially "compiled" into a neural net as one of the good forms of neuro-symbolic AI. Because it works and like I say there are important use cases where it beats just using the symbolic system.

Neuro-symbolic AI can often feel a bit like English cuisine where stuff is like bangers (i.e. sausages) and mash or a full English (breakfast), or a Sunday roast, where a bunch of disparate ingredients are prepared essentially independently and separately and then plonked on a plate all together. Most other cuisines don't work that way: you cook all the ingredients together and you get something bigger than the sum of the parts, a gestalt, if you like. Think e.g. of Greek gemista (tomatoes, bell peppers and occasionally zucchini and aubergines stuffed with rice) or French cassoulet (a bean stew with three different kinds of meat and a crusty top).

Lots of the neuro-symbolic stuff I've seen do it the English breakfast way: there's a neural net feeding its output to a symbolic system, rarely the other way around. But what the authors have done here, which others have also done, is to train a neural net on the output of a symbolic system, thereby basically "cooking" them together and getting the best of both worlds. Not yet a gestalt, as such, but close. Kind of like souvlaki with pitta (what the French call a "sandwich Grecque").

I like your analogies and I'd like to subscribe to your newsletter. (I'm also hungry now.)

I wrote that at lunchtime :P

I'm unfortunately not (self?) important enough to have a newsletter. Thanks though, that's very sweet.

Yet the idea of using the emulator to narrow down the viable space and then verifying with high-fidelity runs is still a huge win in terms of efficiency

It will be more accurate than, say, the monte carlo simulations that were used to build the atomic bomb.

Physicists have been doing this sort of thing for a long time. Arguably they invented computers to do this sort of thing.

Computers were invented to break cryptography.

That depends entirely upon a definition of computer Vs calculator and upon the distinction between "invented" (conceived) vs "assembled and working".

ENIAC (1945) wasn't assembled to for cryptography, nor was the Difference Engine (1820s) designed for that purpose.

Between these the Polish Bomba's (1938) were adapted from other designs to break Enigma codes but lacked features of general purpose computers like ENIAC.

Tommy Flowers' Colossus (1943–1945) was a rolling series of adaptions and upgrades purposed for cryptography but programmed via switches and plugs rather than a stored program and lacked ability to modify programs on the fly.

Thanks, this was going to be essentially my response. I'm glad you beat me to it so I didn't have to look up the dates.

But for the interested, the Von Neumann became one of the lead developers on the ENIAC. The Von Neumann architecture is based on a writeup he did of the EDVAC. Von Neumann and Stanislaw Ulam worked out monte carlo simulations for the Manhattan project.

The first programmable electronic computer was developed at the same time as randomized physics simulations and with the same people playing leading roles.

It reminds me of DLSS, with similar limitations.

Especially if each frame of the simulation derives from the previous one. How do you think this universe works, to me that sounds exactly the same. Every moment is derived from the previous instant.

Leaving aside the question of whether the universe is discrete or continuous, a simulation would still have lower "resolution" than the real world, and some information can be lost with each time step. To compensate for this, it can be helpful to have simulation step t+1 depend on both the step t and step t-1 states, even if this dependency seems "unphysical."

The universe evolves exactly under physical laws, but simulations only approximate those laws with limited data and finite precision. Each new frame builds on the last step’s slightly imperfect numbers, so errors can compound. Imagine trying to predict wind speeds with thermometers in the ocean — you can’t possibly measure every atom of water, so your starting picture is incomplete. As you advance the model forward in time, those small gaps and inaccuracies grow. That’s why “finer detail” from a coarse model usually isn’t new information, just interpolation or amplified noise.

> The universe evolves exactly under physical laws

Has this been confirmed already? Seems like the 'laws' we know are just an approximation of reality. 2) if none external intervention has been detected it doesn't mean there was none.

Fine details. We are talking about NN model vs algorithm. Both are approximation, and in practice model can fill the gaps in data that algorithm cannon, or does not by default. Good example would be image scaling with in-painting for scratches and damaged parts.

There are no frames in the real world, it literally does not work like that.

There are frames in simulations though! Typically measured as time steps. That the frame usually has N_d dimensions is insignificant.

There are frames in every digital signal. Like a simulation.