Just to directly address this - the key thing wrong in this formula is that it's ^2*state, not ^state. The state transition table operates both on the current state and the input you read from the tape, so for a binary turing machine with 5 states, you have a 10-entry transition table.
Ah, yes, of course the read bit should move into the exponent, since it's an input, not an output of the function. But the key point I was making is that there exists a formula. (I don't really care what the formula is.) The part I was not understanding was the complexity of "essentially different".
In this context, two syntactically-different TMs are considered "essentially the same" if all reachable states are the same up to reordering their labels (except for the fixed starting label A) and globally swapping the L/R directions.
The problem is knowing how many states are actually reachable, and how many are dead code. This is impossible to decide in general thanks to Rice's theorem and whatnot. In this case, it involves deciding all 4-state machines.