I am a structural bioinformatics engineer, so my ignorance (adjacent fields not quite carrying over) comes from two different directions, so to say.

That being said: I feel that there must be some kind of benchmark for this. If no such benchmark exists, use your framework, pair up with a couple of pharmacists, and create one.

You're right that this should exist. The closest things I've found are DE-INTERACT (4,248 binary drug-excipient pairs, classification only) and the Chitre et al. shampoo stability dataset (812 formulations, 18 ingredients). CheMixHub has ~500K measurements but mostly thermophysical properties of simple binary/ternary systems.

Nothing exists at the level of "here's a real multi-component formulation, here's what happened when it was made." Every CPG and pharma company has thousands of these records locked in R&D databases.

I've started building one (FormulaBench) with defined splits and baselines on the public data that does exist, but you're right that the real version needs domain collaborators. If you know pharmacists or formulation scientists who'd be interested in contributing, I'd genuinely welcome the introduction.