Hacker News

You could prune this drastically by just tokenizing the file with a lexer suitable for the language, turn them into a canonical state (e.g. replace the contents of any comment tokens with identical text), and check if the token state has changed. If you have a restartable lexer, you can even re-tokenize only from the current line until the state converges again or you encounter a syntax error.