Go has taken exactly this stance, at least since Go 1.22 from 18 months ago.

They settled on 8 rounds of ChaCha with 300 bytes of internal state to amortize the latency. The end result is something only 1.5-2x slower than (their particular flavor of) PCG [1]. It was deemed good enough to become the default.

[1]: https://go.dev/blog/chacha8rand#performance