would be creepiest if your cursor moved somewhere related to what you were saying outloud.

the capability is there, your local hardware determines how seamless it would be.

I made something related to this with whisper. It would just constantly listen and periodically do a search to find a picture/video/gif from the web, relevant to what you're talking about, and show it.