With reasoning on I found E4B to be solid, but E2B was completely unusable across several tests.