Brooks describes how speech preprocessed by chopping it up into short time segments and converting the segments to the frequency domain. He then bemoans the fact that there's no similar preprocessing for touch data. OK.
But then he goes on to vision, where the form that goes into vision processing today is an array of pixels. That's not much preprocessing. That's pretty much what existed at the image sensor. Older approaches to vision processing had feature extractors, with various human-defined feature sets. That was a dead end. Today's neural nets find their own features to extract.
Touch sensing suffers from sensor problems. A few high-detail skin-like sensors have been built. Ruggedness and wear are a big problem.
Consider, though, a rigid tool such as an end wrench. Humans can feel out the position of a bolt with an end wrench, get the wrench around the bolt, and apply pressure to tighten or loosen a nut. Yet the total information available is position plus six degrees of freedom of force. If the business end of your tool is rigid, the amount of info you can get from it is quite limited. That doesn't mean you can't get a lot done. (I fooled around with this idea pre-LLM era, but didn't get very far.) That's at least a way to get warmed up on the problem.
Here's a video of a surgeon practicing by folding paper cranes with small surgical tools.[1] These are rigid tools, so the amount of touch information available is limited. That's a good problem to work on.
As you tighten a bolt the angle you need to apply force changes. So it’s not just a fixed position plus force in 6 directions its force in 6 directions at each position. You can learn quite a bit about something from such interactions such as an objects weight, center of mass, etc.
Further robots generally have more than a single rigid manipulator.
Yes, it's time-varying data, but there are not that many channels. And the sensors are off the shelf items, although overpriced.
Yeah, humans can teleop some pretty complex operations even through cheap robot arms!
> That's a good problem to work on.
Not sure which lab (I think google?) it was, but there was a recent demo of a ML-model driven robot that folded paper in that style as one of the tasks.
That's a bit like saying speech recognition can be solved with ML and an air pressure sensor.