I like this but based on what I am seeing here and the THRML readme, I would describe this as "an ML stack that is fully prepared for the Bayesian revolution of 2003-2015." A kind of AI equivalent of, like, post-9/11 airport security. I mean this in a value-neutral way, as personally I think that era of models was very beautiful.
The core idea of THRML, as I understand it, is to present a nice programming interface to hardware where coin-flipping is vanishingly cheap. This is moderately useful to deep learning, but the artisanally hand-crafted models of the mid-2000s did essentially nothing at all except flip coins, and it would have been enormously helpful to have something like this in the wild at that time.
The core "trick" of the era was to make certain very useful but intractable distributions built on something called "infinitely exchangeable sequences" merely almost intractable. The trick, roughly, was that conditioning on some measure space makes those sequences plain-old iid, which (via a small amount of graduate-level math) implies that a collection of "outcomes" can be thought of as a random sample of the underlying distribution. And that, in turn, meant that the model training regimens of the time did a lot of sampling, or coin-flipping, as we have said here.
Peruse the THRML README[1] and you'll see the who's who of techniques and modeling prodedures of the time. "Gibbs sampling", "probabilistic graphical models", and "energy-based models", and so on. All of these are weaponized coin flipping.
I imagine the terminus of this school of thought is basically a natively-probabilistic programming environment. Garden variety deterministic computing is essentially probabilistic computing where every statement returns a value with probability 1. So in that sense, probabilistic computing is a ful generalization of deterministic computing, since an `if` might return a value with some probability other than 1. There was an entire genre of languages like this, e.g., Church. And now, 22 years later, we have our own hardware for it. (Incidentally this line of inquiry is also how we know that conditional joint distributions are Turing complete.)
Tragically, I think, this may have arrived too late. This is not nearly as helpful in the world of deep learning, with its large, ugly, and relatively sample-free models. Everyone hates to hear that you're cheering from the sidelines, but this time I really am. I think it's a great idea, just too late.
[1]: https://github.com/extropic-ai/thrml/blob/7f40e5cbc460a4e2e9...
Really informative insight, thanks. I'm not too familiar with those models, is there any chance that this hardware could lead to a renaissance of sample-based methods? Given efficient hardware, would they scale to LLM size, and/or would they allow ML to answer some types of currently unanswerable questions?
Any time something costs trillionths of a cent to do, there is an enormous economic incentive to turn literally everything you can into that thing. Since the 50s “that thing” has been arithmetic, and as a result, we’ve just spent 70 years trying to turn everything from HR records to images into arithmetic.
Whether “that thing” is about to be sampling is not for me to say. The probability is certainly not 0 though.