I think we’re past the “if only we had more training data” myth now. There are pretty obviously far more fundamental issues with LLMs than that.

i've been working in this field for a very long time, i promise you, if you can collect a dataset of a task you can train a model to repeat it.

the models do an amazing job interpolating and i actually think the lack of extrapolation is a feature that will allow us to have amazing tools and not as much risk of uncontrollable "AGI".

look at seedance 2.0, if a transformer can fit that, it can fit anything with enough data