In any case, it seems like a "simple" problem to solve. An accelerometer chip costs a few cents and the data rates can be handled by a very light wiring harness, ex I2C.
So embedding such a sensor in every rigid component, wiring a single data line to all of them (using the chassis as electrical ground) and feeding the data back to the model seems a trivial way to work around this problem without any kind of real pressure sensitivity. The model knows the inputs it gives to the actuators/servos, so it will quickly learn to predict the free mechanical behavior of the body, and use any deviation to derive data equivalent to pressure and force feedback.
Another possible source of data is the driving current of the motors/actuators which is proportional to the mechanical resistance the limb encounters. All sorts of garbage sources of data that were almost useless noise in the classical approach become valuable with a model large enough.
> they will likely have to collect the both the right data, and learn the right thing.
The "bitter lesson" says to stop trying to find simple rules for how to do things - stop trying to understand - and instead to use massive data and massive search to deal with all the incredibly fussy and intractable details magically.
But the article here is saying that the lesson is false at its root, because in fact lots of understanding is applied at the point of choosing and sanitising the data. So just throwing noise the model won't do.
This doesn't seem to match experience, where information can be gleaned from noise and "garbage sources of data ... become valuable with a model large enough", but maybe there's something illusory about that experience, IDK.
Natural language wasn't solved by brute force until we started using trillion parameter models and using the whole internet, every book and every article ever published as training data.
I don't know of anyone spending tens of billions on this problem like Microsoft did for OpenAI. First you'd have to build up a dataset of trillions of token equivalents for motion. What that looks like alone is largely guess work. Then you'll need to build a super computer to scale up the current sota motion model to 100 times the size of the biggest model today. Then you'll have to pretrain and finetune the models.
If after all that dexterity still isn't solved all we can say is that we need more data and bigger models.
People seriously don't understand how big big data for AI is and what a moonshot GPT3 and 4 were.
Tesla's approach is "start with motion captured data, move on to first person view video demonstrations, then move on to any video demonstrations - i.e. feed YouTube into the system and hope it learns something from that".
And that led to a car that kills people.
While the companies that have cars moving around safely all used a very diverse mix of human and ML created models. Completely on the face at the Bitter Lesson.
Isn’t one of the main functions of the brain / nervous system to “filter” noisy sensory input data to provide a coherent “signal” to perception? Perhaps a smaller or more specialized model could do that if we ended up packing the “skin” of the humanoid with various sensors.
In theory yes, a small(er) model to do almost anything exists in the problem space of all models.
The problem is getting to that model state. Evolution found these models by creating a ridiculously huge number of experiments with the cut off function being 'can it breed before it dies on a limited number of calories'.
At least at this point it doesn't seem likely we can find a shortcut beyond that necessary computation. Evolution did it with time and parallelism. We do it differently with scale and rapid energy usage.
> Another possible source of data is the driving current of the motors/actuators which is proportional to the mechanical resistance the limb encounters.
The problem is precisely the actuators. A lot of a human's muscles actually come in pairs - agonist and antagonist muscles [1], and it's hard to match the way human muscles work and their relatively tiny size in a non-biological actuator.
Just take your elbow and angle it to 90 degrees, then rapidly close it so your upper and lower arm are (almost) in parallel. An absolutely easy, trivial task to do for your pair of muscles controlling the tendons. But now, try to replicate even this small feat in a motor based actuator. You either use some worm gear to prevent the limb from going in the wrong direction but lose speed, or you use some sort of stepper motor that's very hard to control and takes up a lot of space.
[1] https://en.wikipedia.org/wiki/Anatomical_terms_of_muscle
> Just take your elbow and angle it to 90 degrees, then rapidly close it so your upper and lower arm are (almost) in parallel.
That's trivial with the modern flat motors and position feedback. In fact, motors can do it faster and with more precision than we.
The only reason it was ever hard was because motors didn't have a lot of torque/volume.
The reason our muscles come in pairs is because they can only really apply force in one direction. Motors don't have this limitation, and don't need to be paired.
Anyway, motors still don't have enough torque density for making fine manipulators, and the lack of sensorial data will still stop you from interacting well with the outside world.
From the article: a human hand has about 17,000 low-threshold mechanoreceptors in the glabrous skin (where hair doesn’t grow) of the hand, with about 1,000 of them right at the tip of each finger, but with much lower density over the rest of each finger and over the palm. These receptors come in four varieties (slow vs fast adapting, and a very localized area of sensitivity vs a much larger area) and fire when they sense pressure applied or released.
Where can you buy the artificial equivalent?
Naturalistic fallacies will only carry you so far. For example, my 12 year old car has none of the incredibly adapted limbs and muscles of a cheetah, but can still easily exceed the animal land speed.
The article makes a compelling case that a certain kind of sensory input and learning is necessary to crack robotic movement in general, it remains to be seen if such a fine array of sensors as the human hand is useful outside very specific use-cases. A robot that can stock shelves reliably would still be immensely useful and very generalizable, even if it can't thread the needle due to limited fine sensory abilities.
You are moving the goalpost.
Title of the article you're commenting: Why Today’s Humanoids Won’t Learn Dexterity
Thesis the article is contradicting: The idea is that humanoid robots will share the same body plan as humans, and will work like humans in our built for human environment. This belief requires that instead of building different special purpose robots we will have humanoid robots that do everything humans can do.
You are now arguing that a specialized robot lacking dexterity would still be immensely useful. Nobody is disputing that. It's just not what the article is about.