This is rarely the correct thing to do. Users don't particularly like it if you refuse to process a document because it has an error somewhere in there.
Even for identifiers you probably want to do all kinds of normalization even beyond the level of UTF-8 so things like overlong sequences and other errors are really not an inherent security issue.