That’s going to be a tricky problem full of compromises, and entirely up to how you formalize your definition for “the best possible dithered image”.

Do you care about preserving relative brightness, contrast, edges,… etc.

Human color perception is tricky, and in the outline you give it’s entirely possible that the provided n-color palette (also, what order of magnitude n are we talking about here?) would be inadequate for a satisfactory rendering of the provided full color image.

I'd just like a subjectively "good" result that beats the manual approaches using image manipulation programs.

n would be less than 4096, but usually much smaller values (256, 16)