Hacker News

This is cool. It makes me want an unsloth quant though! A 7b local model with tool calling would be genuinely useful, although I understand this is not that.

UPDATE: I'd skip this for now - it does not allow any kind of interactive conversation - as I learned after downloading 5G of models - it's a proof of concept that takes a wav file in.

taf2 13 hours ago [ - ]

I forked and added tool calling by running another llm in parallel to infer when to call tools it works well for me to toggle lights on and off.

Code updates here https://github.com/taf2/personaplex

ttul 9 hours ago [ - ]

Cool approach. So basically the part that needs to be realtime - the voice that speaks back to you - can be a bit dumb so long as the slower-moving genius behind the curtain is making the right things happen.

taf2 4 hours ago [ - ]

Yes exactly- one part I did not like is we have to also separately transcribe because it does not also provide what the person said only what the ai said

anluoridge 14 hours ago [ - ]

It provides a voice assistant demo in /Examples/PersonaPlexDemo, which allows you to try turn-based conversations. Real-time conversion is not implemented tho.

Lapel2742 16 hours ago [ - ]

> I'd skip this for now - it does not allow any kind of interactive conversation - as I learned after downloading 5G of models - it's a proof of concept that takes a wav file in.

I haven't looked into it that much but to my understanding a) You just need an audio buffer and b) Thye seem to support streaming (or at least it's planed)

> Looking at the library’s trajectory — ASR, streaming TTS, multilingual synthesis, and now speech-to-speech — the clear direction was always streaming voice processing. With this release, PersonaPlex supports it.

isodev 15 hours ago [ - ]

> You just need an audio buffer

That alone to do right on macOS using Swift is an exercise in pain that even coding bots aren't able to solve first time right :)

reactordev 13 hours ago [ - ]

I beg to differ. My agent just one-shotted a MicrophoneBufferManager in swift when asked.

Complete with AVFoundation and a tap for the audio buffer.

It really is trivial.

Anonbrit 11 hours ago [ - ]

Any chance of pushing it to GitHub? My swift knowledge could be written out on an oversized beer coaster currently, so I'm still collecting useful snippets

reactordev 8 hours ago [ - ]

https://gist.github.com/gabereiser/cd8c67262717afd2539dc9c3d...

hirvi74 7 hours ago [ - ]

I've also had great results with using LLMs to pry into Apple's private and undocumented APIs. I've been impressed with the lack of hallucinations for C/C++ and Obj-C functions.

I can attest that the quality in this domain has greatly improved over the years too. I am not always fan of the quality of the Swift code that my LLM produces, but I am impressed that what is often produced works in one shot, as well. The quality also is not that important to me because I can just refactor the logic myself, and often prefer to do it anyway. I cannot hold an LLM to any idiosyncrasies that I do not share with it.

reactordev 6 hours ago [ - ]

Exactly. Even if it’s a skeleton, as long as it does “The Thing”, I’m happy. I can always refactor into something useful.

Tepix 16 hours ago [ - ]

Bummer. Ideally you'd have a PWA on your phone that creates a WebRTC connection to your PC/Mac running this model. Who wants to vibe code it? With Livekit, you get most of the tricky parts served on a silver platter.

reactordev 13 hours ago [ - ]

This is the way. This is something I’m working on but for other applications. WebRTC voice and data over LiveKit or Pion to have conversations.

scotty79 10 hours ago [ - ]

This is interactive:

https://github.com/NVIDIA/personaplex