IMO local models is kind of inevitable.
Hardware vendors will create efficient inference pcie chips and innovations in ram architecture will make make even mid-level devices capable of running local 120B parameter models efficiently.
Open source models will get good enough that there isn’t a meaningful difference between them and the closed source offerings.
Hardware is relatively cheap, it’s just that vendors haven’t had enough cycles yet on getting local inference capable devices out to the people.
I give it 5 years or so before this is the standard