When/if gains slow down, I can definitely see branching out into hardware to sell for on-prem inference once the models can be etched into the silicon with hard wired weight chips. I'd guess maybe at least 5+ years away from that though.

I think this is inevitable. Sooner or later, model-specific ASIC's will make economical sense. We're already seeing it happening with Taalas/Cerebras so I think it's sooner than 5 years. And inference is order of magnitude faster which is amazing.