I'm looking to replace my Alexa with an alternative where I can use a realtime model like Gemini or an STT -> LLM -> TTS pipeline. Should be easy to build with an Arduino or I'd even be happy buying an already made solution.

Basic functions should include playing Spotify, asking questions, settings timers.

The ESP32S3 has wake word support: https://components.espressif.com/components/espressif/esp-sr...

The rest is just some vibe coding…

If it's possible via vibe coding, then there are a few projects out there that do just exactly that.

I believe home assistant, is/was working on a physical products for this, but no idea if it available yet