Yeah, the latency numbers provide a ceiling for your algorithm. The actual performance depends on the implementation, code generation, runtime hazards, small dependencies one may have overlooked etc.