Is this actually beneficial than, say having a bunch of smaller ones communicating on a bus? Apart from space constraints that is.

It's a single wafer, not a single compute core. A familiar equivalent might be putting 192 cores in a single Epyc CPU (or, more to be more technically accurate, the group of cores in a single CCD) rather than trying to interconnect 192 separate single core CPUs externally with each other.

Yes, bandwidth within a chip is much higher than on a bus.