Hacker News

I found helpful this explanation of what Antithesis isn't:

> Property-based testing vs. Antithesis

> Property-based testing (PBT) uses random inputs to check individual data structures, procedures, or occasionally whole programs for high-level invariants or properties. Property-based testing has much in common with fuzzing—the main differences are heritage (PBT comes from the functional programming world, while fuzzing comes from the security/systems programming world) and focus (program functionality vs. security issues). Like fuzzing, PBT is generally only applicable to self-contained libraries and processes.

> Antithesis is analogous to applying PBT to an entire interacting software system—including systems that are concurrent, stateful, and interactive. Antithesis can randomly vary the inputs to a software program, and also the environment within which it runs. Like a PBT system, Antithesis is designed to check high-level properties and invariants of the system under test, but it can do so with many more types of software.

I've scrubbed through the video, and it seems to be 100% talking-head filler except for an outro still image—no actual video information content at all unless you want to analyze Wilson's facial expressions or think he's hot.

Regular reminder that yt-dlp (--write-sub --write-auto-sub --sub-lang en) can download subtitles that you can read, grep, and excerpt, so you don't have to watch videos like this unless you like to.

At the moment I'm getting "HTTP Error 429: Too Many Requests" (with yt-dlp-2025.9.5 installed in a virtualenv via pip), which has been happening more often recently. I got it when downloading the Spanish subtitles file after successfully downloading the English one, so yt-dlp didn't continue on to try to download the video. But YouTube has also been working unreliably for me in the browser.

Edit: a few minutes later it worked, although I didn't let it download the whole video, because it was huge. The subtitle file is 12631 words processed with http://canonical.org/~kragen/sw/dev3/devtt.py. That's about 38 minutes of reading.

One drawback of the transcript in this case is that it doesn't identify the speaker. It doesn't seem to contain many errors.

The key point seems to be this one (18'06"):

> But what you what you what you want to do is use guidance and use feedback from the system under test to optimize the search and notice when things have interesting things have happened, things that aren't necessarily bugs, but that are rare behavior or special behavior or unusual behavior. And so the test system can see that something interesting has happened and follow up opportunistically on that discovery. And that gives you a massive lift in the speed of finding issues.

> And the way that we're able to do that is with this sort of magical hypervisor that we've developed which allows us to deterministically and perfectly recreate any past system state.

> So people generally think of the value of that hypervisor as like any issue we find is reproducible. Nothing is nothing is works on my machine. If we find it once we can repro it for you add infin item.

Including reproducibility that isn't of phenomena that are, strictly speaking, computational:

> like all of the like very low-level decisions about when threads get scheduled or how long particular operations take or you know exactly how long a packet takes to get from node A to node B will reproduce 100% perfectly from run to run.

But, interestingly, they're not targeting things like multicore race conditions, even though their approach is the only way you could make them reproducible; instead they just always do some kind of thread interleaving (though they do change the thread interleaving order):

> If you did it that way, you could like a cycle accurate CPU simulator, you could find all kinds of like weird bugs that required like true multicore parallelism or like you know weird me atomic memory operations, stuff like that. Yeah. Um, we are not trying to find those bugs because 99.999% of developers can never even think about those bugs, right? Like we're trying to we're trying to find we're trying to find more more everyday type stuff.

Also:

> 99% of your CPU instructions are just executing on the host CPU and it's very fast. Um and so that that means there's not much performance overhead at all to doing this which is which is I think really important to making it actually practical.

I'm guessing this means they're using the hypervisor virtualizable instruction set extensions on amd64 CPUs (VT) just like Xen or whatever.

I found amusing the analogy of deterministic-replay-based time-travel fuzzing (like American Fuzzy Lop does) to save-scumming:

> But the crazy thing is once I have a time machine, once I have a hypervisor, I can run until I make event A happen. And then if I notice that event A has happened, I can say this is interesting. I want to now just focus on worlds where event A has happened. I don't need to refind event A every single time. I can just lock it in, right? It's like if you play computer games, it's like save scumming, right? It's like I can I can just save my state when I got the boss down to half health and now always reload from that point.

> And so it takes me a thousand trials to get event A to happen and now just another thousand to get B to happen instead of it taking a million trials if I always have to start from the start.

A lot of the content of the interview is not going to be novel if you're familiar with things like afl_fuzz, data prevalence, or time-travel debugging, but it's pretty interesting to read about what their experiences are.

As far as I know, though, this is novel:

> when we actually do find a bug we can then go back and and ask when did the bug become inevitable right this is this is kind of kind of crazy

> how how

> right we can we can we can go back to the previous time that we reached in and changed the future and we can try changing it to like a hundred different things and see if they all still hit the bug. And if they do, it means the bug was already baked in. And then we can go back to the next one before that and do the same thing.

> Yeah. Yeah.

> And we can sort of bisect backwards and then we can find the exact moment when the bug went from really unlikely to really likely. And then we can do things like look at which lines of code were running then, you know, look at, you know, look at all all you know what what log messages were being printed then. And often that is actually enough to root cause the bug too.