Perhaps a viable approach might be to vibe code the translation tool itself and observe that for every input it gives the expected output. Then once the translation is done, the translation tool can be discarded.
This would require a robust test suite though.
One of the cases where vibe coding might actually be useful, writing a throwaway tool.
I see this dilemma with LLMs all of the time.
Should you use the LLM to do the thing directly, or use the LLM to implement a tool that does the thing?
I tend to reach for the latter, it’s easier to reason about.