> it will be really slow (multiple seconds per token!)

This is fun for proving that it can be done, but that's 100X slower than hosted models and 1000X slower than GPT-Codex-Spark.

That's like going from real time conversation to e-mailing someone who only checks their inbox twice a day if you're lucky.