> We chose a Saturday to format the entire codebase to avoid merge conflicts. And while our test suite gave us high confidence we'd gotten everything right, it's always a bit daunting to have a diff so large that GitHub can't render it.
The dart formatter has an internal sanity check. It walks through the unformatted and formatted strings in parallel skipping any whitespace. If any non-whitespace characters don't match, it immediately aborts. This ensures that the only thing the formatter changes is whitespace, and makes it much less spooky to run it blind on a huge codebase.
That sanity check has saved my ass a couple of times when weird bugs crept in, usually around unusual combinations of language features around new syntax.
(Unfortunately, the formatter in the past year has gotten a little more flexible about the kinds of changes it makes, including sometimes moving comments relatively to commas and brackets, so this sanity check skips some punctuation characters too, making it a little less reliable.)
I imagine a fancier version would be to compare the Abstract Syntax Trees.
I've always thought it would make sense for formatters to be baked into the toolchain so that they can reuse the language's parser (presumably exposed as a library) and then be implemented via parsing to AST and then formatted back out so that they're guaranteed to be correct and normalized. This doesn't seem to be how most formatters work in practice though, although I'm not sure if it's because of performance reasons or a lack of support for the parser being exposed in language toolchains.
That is essentially what clang-format is.
The only issue is then you're at the mercy of whatever parser your formatter uses to construct the AST
Well, if any (common, non-hobby) parser is thrown off by the reformatting, then it's probably not a safe reformatting either way.
Strictly speaking that wouldn’t work, since a1 is different from a 1, for example.
I don't think they're trying to say that this is a sufficient test for correctness but a necessary one.
Lots of formatters also unify things like trailing commas, so it would be slightly more involved than this.