This seems like a good template to generate synthetic data, with positive/negative examples, allowing an embedding model to be aligned more semantically to underlying concepts.

Anyways, I'd hope reranking models do better, have you tried those?