I read this a few weeks ago, and was inspired to do some experiments of my own on compiler explorer, and found the following interesting things:
// This compiles down to a memmove() call
let my_vec: Vec<_> = my_vec.into_iter().skip(n).collect();
// this results in significantly smaller machine code than `v.retain(f)`
v.into_iter().filter(f).collect();
This was all with -C opt-level=2. I only looked at generated code size, didn't have time to benchmark any of these.