But even in this example, the 2ms vs 0.2 is irrelevant - its whatever the timings are for TB-size objects.

So went not compare that case directly? We'd also want to see the performance of the assumed overheads i.e. how it scales.