Hacker News

ericmcer 2 days ago [ - ]

How would recommend doing it? If I was just trying to pull <a/> tag links out I feel like treating it like text and using regex would be way more efficient than a full on HTML parser like JSDom or something.

singron 2 days ago [ - ]

You don't need javascript to parse HTML. Just use an HTML parser. They are very fast. HTML isn't a regular language, so you can't parse it with regular expressions.

Obligatory: https://stackoverflow.com/questions/1732348/regex-match-open...

zahlman 2 days ago [ - ]

The point is: if you're trying to find all the URLs within the page source, it doesn't really matter to you what tags they're in, or how the document is structured, or even whether they're given as link targets or in the readable text or just what.