You don't need javascript to parse HTML. Just use an HTML parser. They are very fast. HTML isn't a regular language, so you can't parse it with regular expressions.

Obligatory: https://stackoverflow.com/questions/1732348/regex-match-open...

The point is: if you're trying to find all the URLs within the page source, it doesn't really matter to you what tags they're in, or how the document is structured, or even whether they're given as link targets or in the readable text or just what.