There are good mature tools for deterministic duplication detection and I intentionally focused on embedding-based to fill this gap (I didn't find other tools using this approach).

If by "more efficient" you mean to avoid embedding of the same code multiple times, this optimization is already implemented internally.