It depends on the function and the SQL implementation, you can see in this simulator that where rand() > rand() evaluates row by row in MySQL but once in SQL Server, so its easy to get this stuff messed up even if the code is "equivalent" its really not.
https://onecompiler.com/mysql/42vq8s23b https://onecompiler.com/sqlserver/42vq8tz24
Thanks, that's a bit upsetting :-)
Indeed.
On systems with unfortunate evaluation semantics for `RAND`, you can generate fresh random values for each row by creating a function for that purpose and calling it on the primary key of each row. I provide one example in the article at:
https://blog.moertel.com/posts/2024-08-23-sampling-with-sql....
I'll include a copy here because it's short. It's for DuckDB and was created to let us generate a controllable number of fresh random values for each row:
`HASH` looks like a slow function ... does something like `rand() + rowid & 0` or `((rand() * p53 + rowid) % p53) / p53` work?
Generally, table scans dominate the cost of sampling, so evaluating a "slow" function once per row doesn't matter. What does matter is whether you can push filtering expressions down into the scans to eliminate I/O and decoding work early. Some systems have trouble pushing down RAND efficiency, which can make alternatives like the deterministic function I shared advantageous.
Thanks. I think my trick was to generate (externally) a table of random numbers, then `join` by rowid that randtable to the table of interest. It was only a few million numbers so didn't take long to generate. And it was pretty much a one-off job.
Had a good laugh, this is the normal response to difference in SQL implementations in my experience.