A simple hello world in C++ can pull in dozens of megabytes of header files.

Years back I worked at a C++ shop with a big codebase (hundreds of millions of LOC when you included vendored dependencies). Compile times there were sometimes dominated by parsing speed! Now, I don't remember the exact breakdown of lexing vs parsing, but I did look at it under a profiler.

It's very easy in C++ projects to structure your code such that you inadvertently cause hundreds of megabytes of sources to be parsed by each single #include. In such a case, lexing and parsing costs can dominate build times. Precompiled headers help, but not enough...

> Now, I don't remember the exact breakdown of lexing vs parsing, but I did look at it under a profiler.

Lexing, parsing and even type checking are interleaved in most C++ compilers due to the ambiguous nature of many construct in the language.

It is very hard to profile only one of these in isolation. And even with compiler built-in instrumentation, the results are not very representative of the work done behind.

C++ compilers are amazing machines. They are blazing fast at parsing a language which is a nightmare of ambiguities. And they are like that mainly because how stupidly verbose and inefficient the C++ include system is.

> Lexing, parsing and even type checking are interleaved in most C++ compilers due to the ambiguous nature of many construct in the language. > > It is very hard to profile only one of these in isolation. And even with compiler built-in instrumentation, the results are not very representative of the work done behind.

Indeed, precise cost attribution is difficult or impossible due to how the nature of the language imposes structure on industrial computers. But that aside, you still end up easily with hundreds of megabytes of source to deal with in each translation unit. I have so many scars from dealing with that...