I think that simplifies it a bit. You can't train without hardware, which is why the Chinese companies are illegally importing Nvidia cards [1].

[1] https://www.theinformation.com/articles/deepseek-using-banne...

The usefulness of the smuggled NVIDIA GPUs has greatly diminished for AI purposes, because the elimination of NVIDIA as a competitor has allowed the growth of the production of domestic GPUs.

Moreover, China has just demonstrated a supercomputer faster than any US supercomputer, which unlike the US supercomputers, which need GPUs, achieves its high computational throughput with custom CPUs designed in China (implementing an Armv9-A ISA with SME, i.e. the scalable matrix extension, and with BF16/INT8 operations for AI).

The CPUs used in that supercomputer can reach both a computational throughput and a memory bandwidth sufficiently high for training any LLMs (they have fast HBM memory). Their only disadvantage in comparison with the best NVIDIA GPUs is a slightly lower energy efficiency, but China has abundant cheap energy so this is not a serious disadvantage for them.

SIMD programmers have to be paid very well then in the China ... Jokes aside, some 2 or 3 years ago I thought that it is becoming inevitable for CPU designs to become an extended versions of their already quite capable vectorized execution engine units.

There is significant evidence they are transitioning to Huawei and other home-grown CPUs and NPUs.

It was announced in April that Deepseek v4 ran at launch on Huawei Ascend chips. They then shared details of their implementation with other Chinese providers to strengthen the Chinese market against import restrictions (more people buying Huawei leads to more production, cheaper capacity)