I see a huge accessibility opportunity for this. Gaze + voice running inside the app (with actual React state access) is way more reliable than screen-reader bolt-ons for hands-free use. Curious if you've thought about other nonverbal inputs, head nods for confirm/cancel, blink patterns, facial expressions since you already have the webcam feed.
That’s a really interesting angle.
Accessibility wasn’t the starting point, but the more I work on this the more it feels like a natural fit.
On nonverbal inputs, I’ve focused on gaze and gestures so far. I’ve thought about things like head nods or blink patterns for simple confirm/cancel, but not explored them deeply yet.
Right now the main challenge is keeping everything reliable without adding too much complexity.
Curious how you’d see this used in practice?