I’m glad this kind of work is getting highlighted on HN, but this is an extremely misleading title, to the point of being outright false.
As often happens, this appears to be due to PR titles being controlled by non-specialists, not the study authors.
While the work the authors do is important, in no sense does the tool they produced actually run a simulation.
A simulation implies a physical model and usually partial differential equations that are often solved on supercomputers, but here the neural network is rather interpolating some fixed simulation output in a purely data-driven way.
The simulations have not gotten faster due to neural networks, cosmologists have just gotten better at using them. Which is great!
Edit: see the sub-comment in the thread by crazygringo for the lead author’s take
The title and the article are both independently true, just not together :) As in, there are certainly cosmic simulations that once needed supercomputers which now run on a laptop, that's just the story of computational progress.
This is the important part. It's not guaranteed to be accurate. They claim it "delivers essentially the same correctness as the model it imitates -- sometimes even finer detail". But can you really trust that? Especially if each frame of the simulation derives from the previous one, then errors could compound.
It seems like a fantastic tool for quickly exploring hypotheses. But seems like once you find the result you want to publish, you'll still need the supercomputer to verify?
I don't know if it's the same thing, but it feels like an analogy:
Protein structure prediction is now considered to be "solved," but the way it was solved was not through physics applied to what is clearly a physics problem. Instead it was solved with lots of data, with protein language modeling, and with deep nets applied to contact maps (which are an old tool in the space), and some refinement at the end.
The end result is correct not because physics simulations are capable of doing the same thing and we could check Alphafold against it, but because we have thousands of solved crystal structures from decades of grueling lab work and electron density map reconstruction from thousands of people.
We still need that crystal structure to be sure of anything, but we can get really good first guesses with AlphaFold and the models that followed, and it has opened new avenues of research because a very very expensive certainty now has very very cheap mostly-right guesses.
When it comes to very complicated things, physics tends to fall down and we need to try non-physics modeling, and/or come up with non-physics abstraction.
Protein folding is in no way "solved". AlphaFold dramatically improved the state-of-the-art, and works very well for monomeric protein chains with structurally resolved nearest neighbors. It abjectly fails on the most interesting proteins - just go check out any of the industry's hottest undrugged targets (e.g. transcription factors)
> When it comes to very complicated things, physics tends to fall down and we need to try non-physics modeling, and/or come up with non-physics abstraction.
"When things are complicated, if I just dream that it is not complicated and solve another problem than the one I have, I find a great solution!"
Joking apart, models that can help target potentially very interesting sub phase space much smaller than the original one, are incredibly useful, but fundamental understanding of the underlying principles, allowing to make very educated guesses on what can and cannot be ignored, usually wins against throwing everything at the wall...
And as you are pointing out, when the complex reality comes knocking in it usually is much much messier...
I have your spherical cow standing on a frictionless surface right here, sir. If you act quickly, I can include the "spherical gaussian sphere" addon with it, at no extra cost.
It’s interesting that we do essentially the same thing in all of non-physics science.
Everything is nuclear physics in the end, but trying to solve problems in, say, economics or psychology by solving a vast number of subatomic equations is only theoretically possible. Even in most of physics we have to round up and make abstractions.
I have a thing where I immediately doubt any ML paper that imitates a process then claims that the model is sometimes “even better” than the original process. This almost always means that there is an overzealous experimenter or a PI who didn’t know what they were dealing with.
Hello, lead author here.
First: you are right! A surrogate model is a fancy interpolator so, eventually, it will just be as good as the model it is trying to mimic, not more. The piece that probably got lost in translation is that the codes we are mimicking have some accuracy settings, which sometimes you can't push to maximum because of the computational cost. But with the kind of tools we are developing, we can push these settings when we are creating the training dataset (as this is cheaper than running the full analysis). In this way, the emulator might be more precise than the original code with "standard settings" (because it has been trained using more accurate settings). This claim of course needs check: if I am including an effect that might have a 0.1% on the final answer but the surrogate has an emulation error of order 1%, clearly the previous claim would not be true.
There are straightforward emulation settings in which a trained emulator can be more accurate than a single forward run, even when both training and "single forward run" use the same accuracy settings.
Suppose you emulate a forward model y = F(x), by choosing a design X = {x1, ..., xN}, and making a training set T = {(x1, y1), ..., (xN, yN)}.
With T, you train an emulator G. You want to know how good y0hat = G(x0) is compared to y0 = F(x).
If there is a stochastic element to the forward model F, there will be noise in all of the y's, including in the training set, but also including y0! (Hopefully your noise has expectation 0.)
(This would be the case for a forward model that uses any kind of Monte Carlo under the hood.)
In this case, because the trained G(x0) is averaging over (say) all the nearby x's, you can see variance reduction in y0hat compared to y0. This, for example, would apply in a very direct way to G's that are kernel methods.
I have observed this in real emulation problems. If you're pushing for high accuracy, it's not even rare to see.
More speculatively, one can imagine settings in which (deterministic) model error, when averaged out over nearby training samples in computing y0hat, can be smaller than the single-point model error affecting y0. (For example, there are some errors in a deterministic lookup table buried in the forward model, and averaging nearby runs of F causes the errors to decrease.)
I have seen this claim credibly made, but verifying it is hard -- the minute you find the model error that explains this[*], the model will be fixed and the problem will go away.
[*] E.g., with a plot of y0hat overlaid on y0, and the people who maintain the forward model say "do you have y0 and y0hat labeled correctly?"
That 'finer detail' sounds suspiciously like inventing significant digits from less significant inputs. You can interpolate, for sure, but it isn't going to add any information.
I'm not sure what you mean by that. Neutral networks are pretty good statistical learning tools, and in this kind of application you'll need some stochastic learning, regardless of using a laptop or a supercomputer. It's not like they used an LLM to predict the simulation steps. If you read the paper, they seem to use a simple fully-connected 5-layer neural network architecture, which is a completely different beast from, say, trillion parameters transformers used for LLMs.
It's an approximator, right? I don't know about Astronomy but there are obvious use cases where an approximate result is "good enough" and even better than a precise result if it's significantly cheaper (or faster!) to get the approximation than to calculate the precise result.
In cases like this I'm always thinking of Reimann integrals and how I remember feeling my face scrunching up in distaste when they were first explained to us in class. It took a while for me to feel comfortable with the whole idea. I'm a very uh discrete kind of person.
As an aside, I consider the kind of work described in the article where a classic, symbolic system is essentially "compiled" into a neural net as one of the good forms of neuro-symbolic AI. Because it works and like I say there are important use cases where it beats just using the symbolic system.
Neuro-symbolic AI can often feel a bit like English cuisine where stuff is like bangers (i.e. sausages) and mash or a full English (breakfast), or a Sunday roast, where a bunch of disparate ingredients are prepared essentially independently and separately and then plonked on a plate all together. Most other cuisines don't work that way: you cook all the ingredients together and you get something bigger than the sum of the parts, a gestalt, if you like. Think e.g. of Greek gemista (tomatoes, bell peppers and occasionally zucchini and aubergines stuffed with rice) or French cassoulet (a bean stew with three different kinds of meat and a crusty top).
Lots of the neuro-symbolic stuff I've seen do it the English breakfast way: there's a neural net feeding its output to a symbolic system, rarely the other way around. But what the authors have done here, which others have also done, is to train a neural net on the output of a symbolic system, thereby basically "cooking" them together and getting the best of both worlds. Not yet a gestalt, as such, but close. Kind of like souvlaki with pitta (what the French call a "sandwich Grecque").
Yet the idea of using the emulator to narrow down the viable space and then verifying with high-fidelity runs is still a huge win in terms of efficiency
That depends entirely upon a definition of computer Vs calculator and upon the distinction between "invented" (conceived) vs "assembled and working".
ENIAC (1945) wasn't assembled to for cryptography, nor was the Difference Engine (1820s) designed for that purpose.
Between these the Polish Bomba's (1938) were adapted from other designs to break Enigma codes but lacked features of general purpose computers like ENIAC.
Tommy Flowers' Colossus (1943–1945) was a rolling series of adaptions and upgrades purposed for cryptography but programmed via switches and plugs rather than a stored program and lacked ability to modify programs on the fly.
Thanks, this was going to be essentially my response. I'm glad you beat me to it so I didn't have to look up the dates.
But for the interested, the Von Neumann became one of the lead developers on the ENIAC. The Von Neumann architecture is based on a writeup he did of the EDVAC. Von Neumann and Stanislaw Ulam worked out monte carlo simulations for the Manhattan project.
The first programmable electronic computer was developed at the same time as randomized physics simulations and with the same people playing leading roles.
Especially if each frame of the simulation derives from the previous one.
How do you think this universe works, to me that sounds exactly the same.
Every moment is derived from the previous instant.
Leaving aside the question of whether the universe is discrete or continuous, a simulation would still have lower "resolution" than the real world, and some information can be lost with each time step. To compensate for this, it can be helpful to have simulation step t+1 depend on both the step t and step t-1 states, even if this dependency seems "unphysical."
The universe evolves exactly under physical laws, but simulations only approximate those laws with limited data and finite precision. Each new frame builds on the last step’s slightly imperfect numbers, so errors can compound. Imagine trying to predict wind speeds with thermometers in the ocean — you can’t possibly measure every atom of water, so your starting picture is incomplete. As you advance the model forward in time, those small gaps and inaccuracies grow. That’s why “finer detail” from a coarse model usually isn’t new information, just interpolation or amplified noise.
> The universe evolves exactly under physical laws
Has this been confirmed already? Seems like the 'laws' we know are just an approximation of reality. 2) if none external intervention has been detected it doesn't mean there was none.
Fine details. We are talking about NN model vs algorithm. Both are approximation, and in practice model can fill the gaps in data that algorithm cannon, or does not by default. Good example would be image scaling with in-painting for scratches and damaged parts.
Nice work, Marco! I'm glad to see emulators being built for DESI. I worked on emulators for DES.
Emulators have existed in astrophysics long before ML became part of the zeitgeist. The earliest cosmology emulator paper that I'm aware of is from 2009 here: https://arxiv.org/abs/0902.0429. IIRC the method came from fluid dynamics. It just so happens that the interpolators used under the hood (clasically it was GPs but is NNs latelys) are also popular in ML, and so the method gets lumped into the ML pile.
The key difference between emulation and typical ML is that emulation is always in an interpolation setting, whereas typically predictive ML is extrapolating in one way or another (e.g. predicting future events).
Google has also a global weather model yielding by ten day predictions, and open street map runs local as well. Just today with GraphHopper and a map of Europe I can generate 2700 routes per second on my workstation. When I was young these were not things you could run at home!
Add to that Qwen3-Omni which can run on a well spec'd workstation, and will happily carry on natural language spoken conversations with you, and can work intelligently with images and video as well as all the other stuff LLMs already do.
I don't think Paramount would look kindly on giving it Majel Barret's voice, but it sure feels like talking to the computer on the holodeck.
Not cosmological but yesterday Apple released an interesting protein folding model with 3B param transformer-based arch which runs on M-series hardware and is competitive with state-of-the art models. [1] Code [2]
words fail, simpulations, educated speculations, or whatever, the main thing is to preface these sorts of thing with some indication of there limitations and purely theoretical results, use them as such, and normalise the process of asking if the model has been verified with an observation
I’m glad this kind of work is getting highlighted on HN, but this is an extremely misleading title, to the point of being outright false. As often happens, this appears to be due to PR titles being controlled by non-specialists, not the study authors.
While the work the authors do is important, in no sense does the tool they produced actually run a simulation.
A simulation implies a physical model and usually partial differential equations that are often solved on supercomputers, but here the neural network is rather interpolating some fixed simulation output in a purely data-driven way.
The simulations have not gotten faster due to neural networks, cosmologists have just gotten better at using them. Which is great!
Edit: see the sub-comment in the thread by crazygringo for the lead author’s take
The lead author explains the work in this comment: https://news.ycombinator.com/item?id=45381598
(It was a bit difficult to find by just scrolling)
The title and the article are both independently true, just not together :) As in, there are certainly cosmic simulations that once needed supercomputers which now run on a laptop, that's just the story of computational progress.
More seriously though, https://www.404media.co/a-vast-cosmic-web-connects-the-unive... is another nice article about this work. It has a bit less detail (i.e. less proper nouns in it), but is more readable for it IMO.
That said, I do think the work is still a big deal. Emulators like this are crucial for exploring parameter spaces and doing inference at scale
> The breakthrough combines neural networks...
This is the important part. It's not guaranteed to be accurate. They claim it "delivers essentially the same correctness as the model it imitates -- sometimes even finer detail". But can you really trust that? Especially if each frame of the simulation derives from the previous one, then errors could compound.
It seems like a fantastic tool for quickly exploring hypotheses. But seems like once you find the result you want to publish, you'll still need the supercomputer to verify?
I don't know if it's the same thing, but it feels like an analogy:
Protein structure prediction is now considered to be "solved," but the way it was solved was not through physics applied to what is clearly a physics problem. Instead it was solved with lots of data, with protein language modeling, and with deep nets applied to contact maps (which are an old tool in the space), and some refinement at the end.
The end result is correct not because physics simulations are capable of doing the same thing and we could check Alphafold against it, but because we have thousands of solved crystal structures from decades of grueling lab work and electron density map reconstruction from thousands of people.
We still need that crystal structure to be sure of anything, but we can get really good first guesses with AlphaFold and the models that followed, and it has opened new avenues of research because a very very expensive certainty now has very very cheap mostly-right guesses.
When it comes to very complicated things, physics tends to fall down and we need to try non-physics modeling, and/or come up with non-physics abstraction.
Protein folding is in no way "solved". AlphaFold dramatically improved the state-of-the-art, and works very well for monomeric protein chains with structurally resolved nearest neighbors. It abjectly fails on the most interesting proteins - just go check out any of the industry's hottest undrugged targets (e.g. transcription factors)
> When it comes to very complicated things, physics tends to fall down and we need to try non-physics modeling, and/or come up with non-physics abstraction.
"When things are complicated, if I just dream that it is not complicated and solve another problem than the one I have, I find a great solution!"
Joking apart, models that can help target potentially very interesting sub phase space much smaller than the original one, are incredibly useful, but fundamental understanding of the underlying principles, allowing to make very educated guesses on what can and cannot be ignored, usually wins against throwing everything at the wall...
And as you are pointing out, when the complex reality comes knocking in it usually is much much messier...
I have your spherical cow standing on a frictionless surface right here, sir. If you act quickly, I can include the "spherical gaussian sphere" addon with it, at no extra cost.
It’s interesting that we do essentially the same thing in all of non-physics science.
Everything is nuclear physics in the end, but trying to solve problems in, say, economics or psychology by solving a vast number of subatomic equations is only theoretically possible. Even in most of physics we have to round up and make abstractions.
I have a thing where I immediately doubt any ML paper that imitates a process then claims that the model is sometimes “even better” than the original process. This almost always means that there is an overzealous experimenter or a PI who didn’t know what they were dealing with.
Hello, lead author here. First: you are right! A surrogate model is a fancy interpolator so, eventually, it will just be as good as the model it is trying to mimic, not more. The piece that probably got lost in translation is that the codes we are mimicking have some accuracy settings, which sometimes you can't push to maximum because of the computational cost. But with the kind of tools we are developing, we can push these settings when we are creating the training dataset (as this is cheaper than running the full analysis). In this way, the emulator might be more precise than the original code with "standard settings" (because it has been trained using more accurate settings). This claim of course needs check: if I am including an effect that might have a 0.1% on the final answer but the surrogate has an emulation error of order 1%, clearly the previous claim would not be true.
There are straightforward emulation settings in which a trained emulator can be more accurate than a single forward run, even when both training and "single forward run" use the same accuracy settings.
Suppose you emulate a forward model y = F(x), by choosing a design X = {x1, ..., xN}, and making a training set T = {(x1, y1), ..., (xN, yN)}.
With T, you train an emulator G. You want to know how good y0hat = G(x0) is compared to y0 = F(x).
If there is a stochastic element to the forward model F, there will be noise in all of the y's, including in the training set, but also including y0! (Hopefully your noise has expectation 0.)
(This would be the case for a forward model that uses any kind of Monte Carlo under the hood.)
In this case, because the trained G(x0) is averaging over (say) all the nearby x's, you can see variance reduction in y0hat compared to y0. This, for example, would apply in a very direct way to G's that are kernel methods.
I have observed this in real emulation problems. If you're pushing for high accuracy, it's not even rare to see.
More speculatively, one can imagine settings in which (deterministic) model error, when averaged out over nearby training samples in computing y0hat, can be smaller than the single-point model error affecting y0. (For example, there are some errors in a deterministic lookup table buried in the forward model, and averaging nearby runs of F causes the errors to decrease.)
I have seen this claim credibly made, but verifying it is hard -- the minute you find the model error that explains this[*], the model will be fixed and the problem will go away.
[*] E.g., with a plot of y0hat overlaid on y0, and the people who maintain the forward model say "do you have y0 and y0hat labeled correctly?"
That 'finer detail' sounds suspiciously like inventing significant digits from less significant inputs. You can interpolate, for sure, but it isn't going to add any information.
I'm not sure what you mean by that. Neutral networks are pretty good statistical learning tools, and in this kind of application you'll need some stochastic learning, regardless of using a laptop or a supercomputer. It's not like they used an LLM to predict the simulation steps. If you read the paper, they seem to use a simple fully-connected 5-layer neural network architecture, which is a completely different beast from, say, trillion parameters transformers used for LLMs.
It's an approximator, right? I don't know about Astronomy but there are obvious use cases where an approximate result is "good enough" and even better than a precise result if it's significantly cheaper (or faster!) to get the approximation than to calculate the precise result.
In cases like this I'm always thinking of Reimann integrals and how I remember feeling my face scrunching up in distaste when they were first explained to us in class. It took a while for me to feel comfortable with the whole idea. I'm a very uh discrete kind of person.
As an aside, I consider the kind of work described in the article where a classic, symbolic system is essentially "compiled" into a neural net as one of the good forms of neuro-symbolic AI. Because it works and like I say there are important use cases where it beats just using the symbolic system.
Neuro-symbolic AI can often feel a bit like English cuisine where stuff is like bangers (i.e. sausages) and mash or a full English (breakfast), or a Sunday roast, where a bunch of disparate ingredients are prepared essentially independently and separately and then plonked on a plate all together. Most other cuisines don't work that way: you cook all the ingredients together and you get something bigger than the sum of the parts, a gestalt, if you like. Think e.g. of Greek gemista (tomatoes, bell peppers and occasionally zucchini and aubergines stuffed with rice) or French cassoulet (a bean stew with three different kinds of meat and a crusty top).
Lots of the neuro-symbolic stuff I've seen do it the English breakfast way: there's a neural net feeding its output to a symbolic system, rarely the other way around. But what the authors have done here, which others have also done, is to train a neural net on the output of a symbolic system, thereby basically "cooking" them together and getting the best of both worlds. Not yet a gestalt, as such, but close. Kind of like souvlaki with pitta (what the French call a "sandwich Grecque").
I like your analogies and I'd like to subscribe to your newsletter. (I'm also hungry now.)
I wrote that at lunchtime :P
I'm unfortunately not (self?) important enough to have a newsletter. Thanks though, that's very sweet.
Yet the idea of using the emulator to narrow down the viable space and then verifying with high-fidelity runs is still a huge win in terms of efficiency
It will be more accurate than, say, the monte carlo simulations that were used to build the atomic bomb.
Physicists have been doing this sort of thing for a long time. Arguably they invented computers to do this sort of thing.
Computers were invented to break cryptography.
That depends entirely upon a definition of computer Vs calculator and upon the distinction between "invented" (conceived) vs "assembled and working".
ENIAC (1945) wasn't assembled to for cryptography, nor was the Difference Engine (1820s) designed for that purpose.
Between these the Polish Bomba's (1938) were adapted from other designs to break Enigma codes but lacked features of general purpose computers like ENIAC.
Tommy Flowers' Colossus (1943–1945) was a rolling series of adaptions and upgrades purposed for cryptography but programmed via switches and plugs rather than a stored program and lacked ability to modify programs on the fly.
Thanks, this was going to be essentially my response. I'm glad you beat me to it so I didn't have to look up the dates.
But for the interested, the Von Neumann became one of the lead developers on the ENIAC. The Von Neumann architecture is based on a writeup he did of the EDVAC. Von Neumann and Stanislaw Ulam worked out monte carlo simulations for the Manhattan project.
The first programmable electronic computer was developed at the same time as randomized physics simulations and with the same people playing leading roles.
It reminds me of DLSS, with similar limitations.
Especially if each frame of the simulation derives from the previous one. How do you think this universe works, to me that sounds exactly the same. Every moment is derived from the previous instant.
Leaving aside the question of whether the universe is discrete or continuous, a simulation would still have lower "resolution" than the real world, and some information can be lost with each time step. To compensate for this, it can be helpful to have simulation step t+1 depend on both the step t and step t-1 states, even if this dependency seems "unphysical."
The universe evolves exactly under physical laws, but simulations only approximate those laws with limited data and finite precision. Each new frame builds on the last step’s slightly imperfect numbers, so errors can compound. Imagine trying to predict wind speeds with thermometers in the ocean — you can’t possibly measure every atom of water, so your starting picture is incomplete. As you advance the model forward in time, those small gaps and inaccuracies grow. That’s why “finer detail” from a coarse model usually isn’t new information, just interpolation or amplified noise.
> The universe evolves exactly under physical laws
Has this been confirmed already? Seems like the 'laws' we know are just an approximation of reality. 2) if none external intervention has been detected it doesn't mean there was none.
Fine details. We are talking about NN model vs algorithm. Both are approximation, and in practice model can fill the gaps in data that algorithm cannon, or does not by default. Good example would be image scaling with in-painting for scratches and damaged parts.
There are no frames in the real world, it literally does not work like that.
There are frames in simulations though! Typically measured as time steps. That the frame usually has N_d dimensions is insignificant.
There are frames in every digital signal. Like a simulation.
Nice work, Marco! I'm glad to see emulators being built for DESI. I worked on emulators for DES.
Emulators have existed in astrophysics long before ML became part of the zeitgeist. The earliest cosmology emulator paper that I'm aware of is from 2009 here: https://arxiv.org/abs/0902.0429. IIRC the method came from fluid dynamics. It just so happens that the interpolators used under the hood (clasically it was GPs but is NNs latelys) are also popular in ML, and so the method gets lumped into the ML pile.
The key difference between emulation and typical ML is that emulation is always in an interpolation setting, whereas typically predictive ML is extrapolating in one way or another (e.g. predicting future events).
Congrats to the authors!
Google has also a global weather model yielding by ten day predictions, and open street map runs local as well. Just today with GraphHopper and a map of Europe I can generate 2700 routes per second on my workstation. When I was young these were not things you could run at home!
Add to that Qwen3-Omni which can run on a well spec'd workstation, and will happily carry on natural language spoken conversations with you, and can work intelligently with images and video as well as all the other stuff LLMs already do.
I don't think Paramount would look kindly on giving it Majel Barret's voice, but it sure feels like talking to the computer on the holodeck.
For personal use, I won't tell Paramount :)
It's wild how much has shifted in just a couple of decades
The emulator: https://github.com/CosmologicalEmulators/Effort.jl
Not cosmological but yesterday Apple released an interesting protein folding model with 3B param transformer-based arch which runs on M-series hardware and is competitive with state-of-the art models. [1] Code [2]
[1] https://arxiv.org/pdf/2509.18480 [2] https://github.com/apple/ml-simplefold
Great to see a Julia project featured. I keep rooting for Julia to take off, but python's dominance in AI has me doubtful.
It is such a great language, and capable of ML and AI workloads, as evidenced by this research.
Simulating the simulation.
Starts feeling like a philosophical problem as much as a computational one
words fail, simpulations, educated speculations, or whatever, the main thing is to preface these sorts of thing with some indication of there limitations and purely theoretical results, use them as such, and normalise the process of asking if the model has been verified with an observation
Feels like the kind of hybrid approach that could become the norm in other fields too
Moore's law surprises people way more than it should by now. This is an awesome project!
This has nothing to do with Moore's law. It's about neural networks.
Can we say that algorithmic speed-ups are sufficiently rare that they are usually impressive?
This isn't any algorithmic speedup.