Hacker News

From the readme:

More devices mean faster performance, leveraging tensor parallelism and high-speed synchronization over Ethernet.

The maximum number of nodes is equal to the number of KV heads in the model #70.

I found this[1] article nice for an overview of the parallelism modes.