I forked and added tool calling by running another llm in parallel to infer when to call tools it works well for me to toggle lights on and off.

Code updates here https://github.com/taf2/personaplex

Cool approach. So basically the part that needs to be realtime - the voice that speaks back to you - can be a bit dumb so long as the slower-moving genius behind the curtain is making the right things happen.

Yes exactly- one part I did not like is we have to also separately transcribe because it does not also provide what the person said only what the ai said