The 2 bit is probably slower because it clashes with some register sizes and how data is read in blocks. No additional benefit because the architecture doesn't read 2 bits but probably min 4 bits and then it clashes with utilization.
Really good visualizations overall.