Has anyone defined a strict subset of C to be used as target for compilers? Or ideally a more regular and simpler language, as writing a C compiler itself is fraught with pitfalls.

> Has anyone defined a strict subset of C to be used as target for compilers? Or ideally a more regular and simpler language, as writing a C compiler itself is fraught with pitfalls.

The main reason you'd target C is for portability and free compiler optimisations. If you start inventing new intermediate languages or C dialects, what's the benefit of transpiling in the first place? You might as well just write your own compiler backends and output the machine code directly, with optimisations around your own language's semantics rather than C.

Imho, C89 is the strict subset that a compiler ought to target, assuming they want C's portability and free compiler optimisations. It's well understood, not overly complex, and will compile to fast, sensible machine code on any architecture from the past half century.

Not precisely, but C-- (hard to search for!) was a C-like (or C subset?) intermediate language for compilers to generate.

I found this Reddit thread that gives a bit more detail:

https://www.reddit.com/r/haskell/comments/1pbbon/c_as_a_proj...

and the project link:

https://www.cs.tufts.edu/~nr/c--/

Sounds like why LLVM was created? (and derivatives like MLIR and NaCL) Its IR is intended be be C-like, except that everything is well-defined and substantially more expressive than C.

For portability, hopefully C89 as well?

I think one could also use a subset compatible with a formal semantics of C. Maybe the C semantics in K Framework, CompCert C, or C0 from Verisoft. Alternatively, whatever is supported in open-source, verification tooling.

Then, we have both a precise semantics and tools to help produce robust output.