If the html in question would include javascript that renders everything, including text, into a canvas -- yes, this is how you would parse it. And PDF is basically that