One thing I wish other languages had was Perl's taint mode: Once enabled, input coming from the outside was "tainted", along with anything you explicitly marked as tainted. If a tainted variable was used to populate another tainted variable (such as by concatenation), the result itself was tainted. If a tainted variable was used in certain ways (such as with the `open` call), the program crashed. The primary way to remove a taint was by running the variable through a regular expression, and using the captured matches (which would not be tainted).

This is "parse, don't validate" as a language feature. Any statically typed language has this, in the sense that you can write your domain logic in terms of a set of "untainted" domain types, and only provide safe conversion functions (parsers) from user input to domain types.

No, they really don’t have this, because for example you can still open() using an arbitrary string as a file name, a string which may have come from unvalidated input. They don’t force you to convert the string to a FileName type and also prove that you have done some sort of pattern-matching on the string.

That is true. You'd need to expose alternative versions of system functions that deal only in parsed and not raw data, and then prohibit the native variants. A little more ceremony, but also a little more flexibility.

Edit: It might be easier to instead replace input functions with ones that return TaintedString, unusable as a regular string. But it's easier to write a linter rule that prohibits any unsafe (default) system functions than one which requires safe input functions, I suppose.

Now I’m imagining a Rust UncheckedString type with a to_string() method that takes a regexp.

Ruby does. Normalization of untrusted input isn't taught or discussed enough. Or each platform's regex security.

Honestly, I think all CS/EE programs should require an OWASP course and that coding should require regular continuing education that includes defensive coding practices for correctness, defined behavior, and security.

This was removed in Ruby 2.7. It was neat, but a bit of a blunt instrument.

True, it had it for a while. Some folks consider Ruby to be Perl's spiritual successor while Python is oft considered the anti-Perl. Ruby's special global variables are pretty Perl like.

I consider Ruby to be Perl 6 :)

gcc's __attribute__((tainted_args)) is pretty handy: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attribute...

I need to read about the history of this feature. It's pretty amazing.

ps: ah well, that was fast https://en.wikipedia.org/wiki/Taint_checking#History :) (1989)

How does that work in practice?

Suppose the Table family type their son Bobby's name into a form. The Perl program now has a "tainted" string in memory - "Robert'; DROP TABLE Students --".

The Perl code passes this string through a regex that checks the name is valid. Names can include apostrophes (Miles O'Brien) and hyphens (Jean-Luc Picard) along with spaces and normal ASCII letters, so the regex passes and the string is now untainted.

> The Perl code passes this string through a regex that checks the name is valid

I think "parse don't validate" doesn't help in this example, but naively the regex would not check whether a name is valid but "extract all parts of the string that are provenly safe".

Which is not reasonable for SQL statements, so someone invented prepared statements.

I think the idea is that the Regex parsing forces the programmer to think about what they're doing with the string and what the requirements for the non-tainted variable are.

For example, a file name string would not allow unescaped directory separators, dots, line breaks, null bytes (I probably got most details wrong here...) and the regex could remove these or extract the substring until the first forbidden character.

Sure, this cannot prevent mistakes.

But the idea, I think, is not to have a variable "safeUserName", instead a "safeDbStatement" one.

You should be using DBI or something that builds on DBI to use prepared statements for database interactions. That’s why it’s called the DataBase Interface.

Nice idea, thank you! I think it should be possible to make a Python object behave in a similar way (crashing when converted to string / ...), need to see if I can make it work.

In PHP, you can construct objects directly from $_GET/POST (and erase everything from these vars to make sure they are not used directly), then lean on data types to make sure that these values are not used in a wrong place.