This paper ignores 50+ years of research in the domain of quantized networks, quantized training algorithms, and reaches wrong conclusions out of sheer ignorance.
TLDR abstract of a draft paper I wrote years ago, for those interested in the real limits of quantized networks:
We investigate the storage capacity of single‐layer threshold neurons under three synaptic precision regimes—binary (1‐bit), ternary (≈1.585‐bit), and quaternary (2‐bit)—from both information‐theoretic and algorithmic standpoints. While the Gardner bound stipulates maximal loads of α=0.83, 1.5 and 2.0 patterns per weight for the three regimes, practical algorithms only reach α_alg≈0.72, 1.0 and 2.0, respectively. By converting these densities into storage‐efficiency metrics—bits of synaptic memory per stored pattern—we demonstrate that only quaternary weights achieve the theoretical optimum in realistic settings, requiring exactly 1 bit of memory per pattern. Binary and ternary schemes incur 39 % and 58 % overheads, respectively.
Is this actually equivalent to classical forms of quantization though? The paper has extensive discussion of quantization on page 2 and 3. This paper is not just a rehash of earlier work, but pushes the single bit precision to more parts of the system.
[dead]