Just wondering - can't AI read HTML? If so how are we training our models?

The AI only sees a bit of HTML plus a bunch of JS that, when executed, generates more HTML. If the AI does not run the JS it won’t see everything. During training they probably use a crawler that runs a headless browser behind the scenes to get everything a human would get.

So... The answer is to use during the real-time access the same headless browser as they used during the training? Which they already do, unless you ask specifically to write and run a python script that uses simple requests?

It is like generating static webpages just for SEO: obsolete since 2012[1], and few years later for other major websites.

[1] https://www.i-programmer.info/news/81-web-general/4248-googl...

Ai can’t read something dynamically rendered with JavaScript. At the moment.

They can, but the token to content ratio is far less, so they work less effectively when it's put into the inference context window.