Hacker News

To run a model locally, they would need to release the weights to the public and their competitors. Those are flagship models.

They would also need to shrink them way down to even fit. And even then, generating tokens on an apple neural chip would be waaaaaay slower than an HTTP request to a monster GPU in the sky. Local llms in my experience are either painfully dumb or painfully slow.