The Leap Motion Controller came out in 2014 already (11 years ago, wow!) and isn't very expensive. The SDK was lacking in the beginning if I recall correctly, but a webcam seems to be inferior. Technology isn't the limiting factor for a quite some time now. I'm sure many projects existed to translate gestures to MIDI, some less polished, some more polished[0][1].

Reminds me... I even used two PlayStation Eyes (EUR 5 each) with OpenCV and the EVM algorithm[2] on a ThinkPad X230 for a dance performance piece back in 2015. Movements rather than gestures and OSC instead of MIDI, but it worked great!

[0]: https://midipaw.com/

[1]: https://uwyn.com/geco/

[2]: https://people.csail.mit.edu/mrub/evm/