The failure mode here is usually not that the API is down, it is that the shape or behavior changed while still returning valid-looking responses.

I would start with a small set of fixed requests against the dependency, store a known-good baseline, and classify changes separately as schema-level changes versus value-level changes.

The important part is keeping the signal narrow enough that additive noise does not page people, while removals, type changes, and null-vs-absent shifts still get reviewed before they break consumers...