A few things to consider.

In this case we're talking about a tight initialization loop with probably a single instruction in the body. The HW optimizations necessary to make a loop like this perform equally to the unrolled form are so rudimentary that they're taken for granted on basically any CPU, even 30 years ago. Seriously, we're talking about optimizations I made in an "intro to Verilog" class as an undergrad, and I'm not even a HW engineer.

It also depends how often this code is being hit. Does the code run once while the program loads? Nobody will notice a 2 microsecond improvement in loading times. Does the code run in a timing-sensitive hot path, like a game loop or a GUI rendering thread? Well now optimization matters. But again, consider the HW argument above.

Also remember that, back then, storage wasn't cheap. 256K of code is 18% of a 1.44MB floppy, and 35% of a 720K floppy.