Cool approach. So basically the part that needs to be realtime - the voice that speaks back to you - can be a bit dumb so long as the slower-moving genius behind the curtain is making the right things happen.

Yes exactly- one part I did not like is we have to also separately transcribe because it does not also provide what the person said only what the ai said