Unless each iteration is 90% faster

This.

In fact, it can be slower because hardware is probably not optimized for the 1-bit case, so there may be a lot of low-hanging fruit for hardware designers and we may see improvements in the next iteration of hardware.

Isn't digital (binary) hardware literally optimized for 1-bit case by definition?

People are confusing word size…

The CPU can handle up to word size bits at once. I believe they mean that a lot of assembly was written for integer math and not bit math. Word size 4+ However, it is unlikely we’ll see improvements in this area because by definition, using 64-bit floats uses max word size. So… that’s the max throughput. Sending 1 bit vs 64 bits would be considerably slower so this entire approach is funny.

No, because there are algorithmic shortcuts that allow approximations and skipped steps in comparison to a strict binary step-by-step calculation, by using in-memory bit reads and implicit rules, among other structural advantages in how GPUs and CPUs instruction sets are implemented in hardware.

FPGA's could be highly-competitive for models with unusual, but small, bit lengths. Especially single bits since their optimizers will handle that easily.

[deleted]