Here's a high level overview for a programmer audience (I'm listed as an author but my contributions were fairly minor):
[See specifics of the pipeline in Table 3 of the linked paper]
* There are 181 million ish essentially different Turing Machines with 5 states, first these were enumerated exhaustively
* Then, each machine was run for about 100 million steps. Of the 181 million, about 25% of them halt within this memany step, including the Champion, which ran for 47,176,870 steps before halting.
* This leaves 140 million machines which run for a long time.
So the question is: do those TMs run FOREVER, or have we just not run them long enough?
The goal of the BB challenge project was to answer this question. There is no universal algorithm that works on all TMs, so instead a series of (semi-)deciders were built. Each decider takes a TM, and (based on some proven heuristic) classifies it as either "definitely runs forever" or "maybe halts".
Four deciders ended up being used:
* Loops: run the TM for a while, and if it re-enters a previously-seen configuration, it definitely has to loop forever. Around 90% of machines do this or halt, so covers most.
6.01 million TMs remain.
* NGram CPS: abstractly simulates each TM, tracking a set of binary "n-grams" that are allowed to appear on each side. Computes an over-approximation of reachable states. If none of those abstract states enter the halting transition, then the original machine cannot halt either.
Covers 6.005 million TMs. Around 7000 TMs remain.
* RepWL: Attempts to derive counting rules that describe TM configurations. The NGram model can't "count", so this catches many machines whose patterns depend on parity. Covers 6557 TMs.
There are only a few hundred TMs left.
* FAR: Attempt to describe each TM's state as a regex/FSM.
* WFAR: like FAR but adds weighted edges, which allows some non-regular languages (like matching parentheses) to be described
* Sporadics: around 13 machines had complicated behavior that none of the previous deciders worked for. So hand-written proofs (later translated into Rocq) were written for these machines.
All of the deciders were eventually written in Rocq, which means that they are coupled with a formally-verified proof that they actually work as intended ("if this function returns True, then the corresponding mathematical TM must actually halt").
Hence, all 5-states TMs have been formally classified as halting or non-halting. The longest running halter is therefore the champion- it was already suspected to be the champion, but this proves that there wasn't any longer-running 5-state TM.
In Coq-BB5 machines are not run for 100M steps but directly thrown in the pipeline of deciders. Most halting machines are detected by Loops using low parameters (max 4,100 steps) and only 183 machines are simulated up to 47M steps to deduce halting.
In legacy bbchallenge's seed database, machines were run for exactly 47,176,870 steps, and we were left with about 80M nonhalting candidates. Discrepancy comes from Coq-BB5 not excluding 0RB and other minor factors.
Also, "There are 181 million ish essentially different Turing Machines with 5 states", it is important to note that we end up with this number only after the proof is completed, knowing this number is as hard as solving BB(5).
Why is knowing that there are 181 M 5-state machines difficult? Is it not just (read*write*move*(state+halt))^state? About 48^5, modulo "essentially different".
Number of 5-state TMs is 21^10 = 16,679,880,978,201; coming from (1 + writemovestate)^2*state; difference with your formula is that in our model, halting is encoded using undefined transition.
"essentially different" is not a statically-checked property. It is discovered by the enumeration algorithm (Tree Normal Form, see Section 3 of https://arxiv.org/pdf/2509.12337); in particular, for each machine, the algorithm needs to know it if will ever reach an undefined transition because if it does it will visit the children of that machine (i.e. all ways to set that reached undefined transition).
Knowing if an undefined transition will be ever reached is not computable, hence knowing the number of enumerated machines is not computable in general and is as hard as solving BB(5).
I guess the meaning of "essentially different" essentially depends on the strength of the mathematical theory that you used to classify them!
When I first heard it I thought about using some kind of similar symmetry arguments (e.g. swapping left-move and right-move). Maybe there are also more elaborate symmetry arguments of some kind.
Isn't it fair to say that there is no single objective definition of what differences between machines are "essential" here? If you have a stronger theory and stronger tools, you can draw more distinctions between TMs; with a weaker theory and weaker tools, you can draw fewer distinctions.
By analogy, suppose you were looking at groups. As you get a more sophisticated group theory you can understand more reasons or ways that two groups are "the same". I guess there is a natural limit of group isomorphism, but perhaps there are still other things where group structure or behavior is "the same" in some interesting or important ways even between non-isomorphic groups?
Turing machine program states are conventionally represented with letters: A, B, C, etc. The starting state is A.
Now suppose you are running a Turing machine program from the beginning. The only state it has visited so far is state A. It runs until it reaches a state that has not been visited yet. What state is it? B? C? D? According to "tree normal form", the name for that next state just is the earliest unused state name, which in this case is B.
Visited states so far are A and B. Run until an unvisited state is reached again. What is its name? C? D? E? Tree normal form says that the state will be called C. And the next newly visited state will be D, etc. In general, the canonical form for a Turing machine program will be the one that puts initial state visits in alphabetical order. (This concept also applies to the multi-color case.)
It's not possible to tell of an arbitrary TM program whether or not that program is in tree normal form. But this proof doesn't look at arbitrary programs. It generates candidate programs by tracing out the normal tree starting from the root, thereby bypassing non-normal programs altogether.
That is what "essentially different" means here.
If you count arbitrary transition tables, then you get a count of 63403380965376 [1].
If you count transition tables in which states are reachable and canonically ordered, then you get a count of 632700 * 4^10 = 663434035200 [2]. These machines can behave differently on arbitrary tape contents.
TNF further reduces these counts by examining machine behaviour when starting on an empty tape.
[1] https://oeis.org/A052200
[2] https://oeis.org/A107668
Just to directly address this - the key thing wrong in this formula is that it's ^2*state, not ^state. The state transition table operates both on the current state and the input you read from the tape, so for a binary turing machine with 5 states, you have a 10-entry transition table.
Ah, yes, of course the read bit should move into the exponent, since it's an input, not an output of the function. But the key point I was making is that there exists a formula. (I don't really care what the formula is.) The part I was not understanding was the complexity of "essentially different".
In this context, two syntactically-different TMs are considered "essentially the same" if all reachable states are the same up to reordering their labels (except for the fixed starting label A) and globally swapping the L/R directions.
The problem is knowing how many states are actually reachable, and how many are dead code. This is impossible to decide in general thanks to Rice's theorem and whatnot. In this case, it involves deciding all 4-state machines.
My intuition is that this would include every 4 state (and fewer) machine that has a fifth state that is never used. For every different configuration of that fifth state I would consider the machine to be essentially the same.
Thank you for the write-down, it is very interesting!
Is there some reason why Rocq is the proof assistant of choice and not one of the others?
also.... https://www.youtube.com/watch?v=c3sOuEv0E2I
It is as simple as: the person who contributed the proofs/implementations chose Rocq.
I did some small proofs in Dafny for a few of the simpler deciders (most of which didn't end up being used).
At the time, I found Dafny's "program extraction" (i.e. transpilation to a "real" programming language) very cumbersome, and produced extremely inefficient code. I think since then, a much improved Rust target has been implemented.
Its focus around constructible objects does lend itself well to the kind of proofs needed for non-halting. Usually they involve lemmas about the 'syntax' of a TM's configuration and how it evolves over time, and the actual number-based math is relatively simple. (With the exception of Skelet #17, which involves fiddly invariants of finite sequences.)
This is a fantastic summary that's very easy to understand. Thank you very much for writing it!
Thank you!
So how many sporadics would be left if we run the same on 6-state TM?
Very many - I think the number is several thousand. Several sporadic 6-state machines have been solved, though, and there are currently about 2400(?) unsolved machines. Among these are several Cryptids, machines whose halting problem is known to be mathematically hard
You guys need a blog to chase these down.
If nothing else, it’ll inspire the next generation of mathematicians.
Thanks for writing that up! Was there anything to be learned from those 13 sporadics or are they simply truly sporadic?
Thank you!
Yeah, one lesson is that if you relax your halting condition to entering a loop at some point you get much longer runtime, see machine Skelet#1 (https://bbchallenge.org/1LC1LE_---1LD_1RD0LD_1LA1RE_0LB0RC), enters a loop after 10^51 steps and the loop is 10^9 steps long.
> There is no universal algorithm that works on all TMs
Is this suspected or proven? I would love to read the proof if it exists.
It is indeed proven and the reason they're called Turing machines! https://en.wikipedia.org/wiki/Halting_problem
Doesn't the discovery of the fifth Busy Beaver value indicate that there is a decider for 5-state Turing machines?
Yes, there are deciders for all finite sets of TMs. You just cannot have one for all TMs.
I think actually for relatively small n we get cases where mathematics says nope, you can't decide that, the machine goes recursive and so now your decider may be looking at a machine which is itself running deciders and Kurt Gödel says "No".
Thanks for the hint to go looking some more. I found that Johannes Riebel has proven that BB(748) is undecidable. So for even small k there may not be deciders for them.
The suspicion is that this happens maybe as early as BB(15). We just can't prove that whereas we can prove BB(745) is not decidable, and, of course, we've decided BB(5) as we see here.
Yes. But there is no decider for n-state Turing machines that works regardless of n.
Scooping the Loop Snooper (an elementary proof of the undecidability of the halting problem)
No program can say what another will do. Now, I won't just assert that, I'll prove it to you: I will prove that although you might work til you drop, you can't predict whether a program will stop.
Imagine we have a procedure called P that will snoop in the source code of programs to see there aren't infinite loops that go round and around; and P prints the word "Fine!" if no looping is found.
You feed in your code, and the input it needs, and then P takes them both and it studies and reads and computes whether things will all end as the should (as opposed to going loopy the way that they could).
Well, the truth is that P cannot possibly be, because if you wrote it and gave it to me, I could use it to set up a logical bind that would shatter your reason and scramble your mind.
Here's the trick I would use - and it's simple to do. I'd define a procedure - we'll name the thing Q - that would take and program and call P (of course!) to tell if it looped, by reading the source;
And if so, Q would simply print "Loop!" and then stop; but if no, Q would go right back to the top, and start off again, looping endlessly back, til the universe dies and is frozen and black.
And this program called Q wouldn't stay on the shelf; I would run it, and (fiendishly) feed it itself. What behaviour results when I do this with Q? When it reads its own source, just what will it do?
If P warns of loops, Q will print "Loop!" and quit; yet P is supposed to speak truly of it. So if Q's going to quit, then P should say, "Fine!" - which will make Q go back to its very first line!
No matter what P would have done, Q will scoop it: Q uses P's output to make P look stupid. If P gets things right then it lies in its tooth; and if it speaks falsely, it's telling the truth!
I've created a paradox, neat as can be - and simply by using your putative P. When you assumed P you stepped into a snare; Your assumptions have led you right into my lair.
So, how to escape from this logical mess? I don't have to tell you; I'm sure you can guess. By reductio, there cannot possibly be a procedure that acts like the mythical P.
You can never discover mechanical means for predicting the acts of computing machines. It's something that cannot be done. So we users must find our own bugs; our computers are losers!
by Geoffrey K. Pullum Stevenson College University of California Santa Cruz, CA 95064
From Mathematics Magazine, October 2000, pp 319-320.
In principle, analysts could could analyze all machines of size K or less, classifying many of them as being halting or not, and all the rest could in fact be non-halting, but the analyst would never (at any finite time) know for sure whether they are all non-halting.
This is proven. It's known as the halting problem and is the central pillar of computational complexity theory. The proof was invented by Alan Turing and is online.
This is just the Halting Problem, many good introductory expositions of the proof exist.
how much computing time does it take to evaluate a machine to 100 million steps on a modern epyc system?
Thanks. Amazing work.