However I don't have any issues with the demo in the middle (the hard shadows). So the artifacting has to be from the soft shadow rules, or from the "few extra tweaks".

The primary force behind real soft shadows is obviously that real lights are not point sources. I wonder how much worse the performance would be if instead of the first two (kinda hacky) soft shadow rules we instead replaced the light by maybe five lights that represent random points in a small circular light source. Maybe you'd get too much banding unless you used a much higher number of light sources, but at the very least it would be an interesting comparison to justify using the approximation