Yeah, I think either the method doesn't work well, or there is something off with their tuning.

Their block-by-block generation method seems to be too local in its considerations, where each 3x3 section (= the ones generated based on the immediate neighbors) looks a lot more coherent than the 4x4 sections and above. I think it might need to be extended to be less local and might also in general need to be paired with some sort of guidance systems (e.g. in the office example would generate the overall floor layout).