Is local not infeasible for models of useful size (at least on a typical dev machine with <= 64GB RAM and a single GPU)