For workstation inference a unified memory architecture would be a good cost/performance balance, while keeping COGs reasonable.

512GB unified memory macs are available, with the ram upgrade costing a few grand.