Defining media types seems right to me, but what ends up happening is that you use swagger instead to define APIs and out the window goes HATEOAS, and part of the reason for this is just that defining media types is not something people do (though they should).

Basically: define a schema for your JSON, use an obvious CRUD mapping to HTTP verbs for all actions, use URI local-parts embedded in the JSON, use standard HTTP status codes, and embed more error detail in the JSON.

> (...) and part of the reason for this is just that defining media types is not something people do (...)

People do not define media types because it's useless and serves no purpose. They define endpoints that return specific resource types, and clients send requests to those endpoints expecting those resource types. When a breaking change is introduced, backend developers simply provide a new version of the API where a new endpoint is added to serve the new resource.

In theory, media types would allow the same endpoint to support multiple resource types. Services would sent specific resource types to clients if they asked for them by passing the media type in the accept header. That is all fine and dandy, except this forces endpoints to support an ever more complex content negotiation scheme that no backend framework comes close to support, and this brings absolutely no improvement in the way clients are developed.

So why bother?