I see a huge accessibility opportunity for this. Gaze + voice running inside the app (with actual React state access) is way more reliable than screen-reader bolt-ons for hands-free use. Curious if you've thought about other nonverbal inputs, head nods for confirm/cancel, blink patterns, facial expressions since you already have the webcam feed.

That’s a really interesting angle.

Accessibility wasn’t the starting point, but the more I work on this the more it feels like a natural fit.

On nonverbal inputs, I’ve focused on gaze and gestures so far. I’ve thought about things like head nods or blink patterns for simple confirm/cancel, but not explored them deeply yet.

Right now the main challenge is keeping everything reliable without adding too much complexity.

Curious how you’d see this used in practice?