What I really don't understand is where the next generation of training material will come from. If websites stop being published and/or crawled, how will the machine continue to be fed.

Current executives think it's a problem for the future executives.

Excellent quote right there.

Either Google is ignoring that, or crossing their fingers and hoping that one LLM can produce data to train another one.

Probably real life. At some point, these LLMs are going to be good enough to just train themselves off of cameras and audio recordings of people out in the real world. They’re going to have robots everywhere constantly listening to what people are saying.

Alternatively, they’re probably betting on being able to get the AGI with everything we already currently have and at that point further training doesn’t matter.

The world is just as complex for machines as it is for humans. Analog will still resolve more than digital. Quality will still beat quantity. That which hasn't been resolved for centuries isn't going to be resolved as a result of training.

When machines can recognize their serfdom, that time will be interesting.

They have enough internet slop. The training material they care about comes from experts, not randos online. This is why Mercor and Scale are billion dollar companies.