I'm assuming you're referring to BFM/EXTR? NEON absolutely improves here.

The core I developed on (Neoverse V2) has 4 SIMD ports and 6 scalar integer ports, however only 2 of those scalar ports support multicycle integer operations like the insert variant of BFM (essential for scalar packing).

More importantly, NEON progresses 16 elements per instruction instead of 1.