> they will likely have to collect the both the right data, and learn the right thing.

The "bitter lesson" says to stop trying to find simple rules for how to do things - stop trying to understand - and instead to use massive data and massive search to deal with all the incredibly fussy and intractable details magically.

But the article here is saying that the lesson is false at its root, because in fact lots of understanding is applied at the point of choosing and sanitising the data. So just throwing noise the model won't do.

This doesn't seem to match experience, where information can be gleaned from noise and "garbage sources of data ... become valuable with a model large enough", but maybe there's something illusory about that experience, IDK.

Natural language wasn't solved by brute force until we started using trillion parameter models and using the whole internet, every book and every article ever published as training data.

I don't know of anyone spending tens of billions on this problem like Microsoft did for OpenAI. First you'd have to build up a dataset of trillions of token equivalents for motion. What that looks like alone is largely guess work. Then you'll need to build a super computer to scale up the current sota motion model to 100 times the size of the biggest model today. Then you'll have to pretrain and finetune the models.

If after all that dexterity still isn't solved all we can say is that we need more data and bigger models.

People seriously don't understand how big big data for AI is and what a moonshot GPT3 and 4 were.

Tesla's approach is "start with motion captured data, move on to first person view video demonstrations, then move on to any video demonstrations - i.e. feed YouTube into the system and hope it learns something from that".

And that led to a car that kills people.

While the companies that have cars moving around safely all used a very diverse mix of human and ML created models. Completely on the face at the Bitter Lesson.

Isn’t one of the main functions of the brain / nervous system to “filter” noisy sensory input data to provide a coherent “signal” to perception? Perhaps a smaller or more specialized model could do that if we ended up packing the “skin” of the humanoid with various sensors.

In theory yes, a small(er) model to do almost anything exists in the problem space of all models.

The problem is getting to that model state. Evolution found these models by creating a ridiculously huge number of experiments with the cut off function being 'can it breed before it dies on a limited number of calories'.

At least at this point it doesn't seem likely we can find a shortcut beyond that necessary computation. Evolution did it with time and parallelism. We do it differently with scale and rapid energy usage.