Making invalid states unrepresentable may be a great idea or terrible idea depending on what you are doing. My experience is all in scientific simulation, data analysis, and embedded software in medical devices.
For scientific simulations, I almost always want invalid state to immediately result in a program crash. Invalid state is usually due to a bug. And invalid state is often the type of bug which may invalidate any conclusions you'd want to draw from the simulation.
For data analysis, things are looser. I'll split data up into data which is successfully cleaned to where invalid state is unrepresentable and dirty data which I then inspect manually to see if I am wrong about what is "invalid" or if I'm missing a cleaning step.
I don't write embedded software (although I've written control algorithms to be deployed on it and have been involved in testing that the design and implementation are equivalent), but while you can't exactly make every invalid state unrepresentable you definitely don't punch giant holes in your state machine. A good design has clean state machines, never has an uncovered case, and should pretty much only reach a failure state due to outside physical events or hardware failure. Even then, if possible the software should be able to provide information to intervene to fix certain physical issues. I've seen devices RMA's where the root cause was the FPU failed; when your software detects the sort of error that might be hardware failure, sometimes the best you can do is bail out very carefully. But you want to make these unknown failures be a once per thousands or millions of device years event.
Sean is writing mostly about distributed systems where it sounds like it's not a big deal if certain things are wrong or there's not a single well defined problem being solved. That's very different than the domains I'm used to, so the correct engineering in that situation may more often be to allow invalid state. (EDIT: and it also seems very relevant that there may be multiple live systems updated independently so you can't just force upgrade everything at once. You have to handle more software incompatibilities gracefully.)
> For scientific simulations, I almost always want invalid state to immediately result in a program crash.
If you have actually made invalid states unrepresentable, then it is _impossible_ for your program to transition into an invalid state at runtime.
Otherwise, you're just talking about failing fast
> _impossible_ for your program to transition into an invalid state at runtime
Not the case for scientific computing/HPC. Often HPC codebases will use numerical schemes which are mathematically proven to 'blow up' (produce infs/nans) under certain conditions even with a perfect implementation - see for instance the CFL condition [1].
The solution to that is typically changing to a numerical scheme more suited for your problem or tweaking the current scheme's parameters (temporal step size, mesh, formula coefficients...). It is not trivial to find what the correct settings will be before starting. Encountering situations like a job which runs fine for 2 days and then suddenly blows up is not particularly rare.
[1] https://en.m.wikipedia.org/wiki/Courant%E2%80%93Friedrichs%E...
In those situations, the functions that contain a singularity should return an Either monad, and in order to bring the resulting data back into the bloodstream of the program, you have to deal with that potential singularity. Unfortunately, scientific computing seems like it's stuck in the age of the dinosaurs with tooling choices and much of the advancements in type systems of the last 40 years are nowhere to be seen. I always found that curious, given its adjacency to academia.
Why? You're just going to unwrap that monad and promptly crash. It's unrecoverable; it doesn't need to go back into "the bloodstream of the program." No amount of static typing will turn this runtime error into a compile-time error, so it doesn't really matter how you express it with types.
> You're just going to unwrap that monad and promptly crash. It's unrecoverable
You handle the error gracefully. It's not "unrecoverable" in the sense of an incorrect memory read that arises from a logic error in the program. It's an anticipatable behavior in a well-defined system of computations that should be treated as such. Simply crashing is extremely sloppy programming in this case, it's not a formal equivalent to discarding an unusable input.
> No amount of static typing will turn this runtime error into a compile-time error
On the contrary. Anything capable of statically checking dependent types can turn any runtime issue into a compile-time issue. Up to and including requiring proof that a function's domain is respected according to all paths that call it as a facet of the type system, and this domain can be inferred by the operations within the function itself.
> Up to and including requiring proof that a function's domain is respected
Does such a proof exist in this context? Or are we writing fanfiction about the problem domain now?
> It's not "unrecoverable"
Yes it is. The correct behaviour in that context is to terminate the program, which makes it unrecoverable.
> Does such a proof exist in this context? Or are we writing fanfiction about the problem domain now?
"Often HPC codebases will use numerical schemes which are mathematically proven to 'blow up' (produce infs/nans) under certain conditions even with a perfect implementation"
Yes? Function domains are trivially enforceable through the type system when you have dependent types. Even the n-dimensional case of the CFL condition is a simple constraint you can express over a type.
Have you ever actually done any work with dependent types? I'm not sure why you would think something so basic as enforcing a function domain (which isn't the same thing as a problem domain, by the way) would be "fanfiction" otherwise. I highly recommend spending a few months actually working with them, there are plenty of good languages floating around these days.
> The correct behaviour in that context is to terminate the program
At worst it's to leave the thread of execution, which is distinct from crashing, as you asserted above and as my core point revolves around.
> Function domains are trivially enforceable through the type system when you have dependent types.
> It is not trivial to find what the correct settings will be before starting. Encountering situations like a job which runs fine for 2 days and then suddenly blows up is not particularly rare.
Somehow I doubt that the "not trivial" problem of finding correct settings before starting suddenly becomes "trivial" when you throw dependent types at it.
> (which isn't the same thing as a problem domain)
yeah bud i'm aware. I meant what I said. You're supposing that it's trivial to determine what the domain of the function in question is when the original post explicitly said otherwise. This is a falsehood about the problem domain.
> At worst it's to leave the thread of execution
Leave the thread and then do what? The stated solution to the problem, according to the original post, is to restart the program with new, manually-tweaked parameters, or to straight-up modify the code:
> The solution to that is typically changing to a numerical scheme more suited for your problem or tweaking the current scheme's parameters
> Invalid state is usually due to a bug
I don't think the article is referring to that sort of issue, which sounds fundamental to the task at hand (calculations etc). To me it's about making the code flexible with regards to future changes/requirements/adaptions/etc. I guess you could consider Y2K as an example of this issue, because the problem with 6 digit date codes wasn't with their practicality at handling dates in the 80's/90's, but about dates that "spanned" beyond 991231, ie 000101.