> The catch is that for image resolutions >=700x700pixels (most production usecases)

Citation needed? 2XL looks like you go up to 800x800 pixel inputs. This isn't the dealbreaker you say it is - all pipelines benefit from thoughtful crop and rescaling before going to inference.

See the url in my comment (search for the term rfdetr-2xlarge). 2XL does indeed go up to 800x800 and has PML1.0 license instead of apache 2.0.

Rescaling is fine for some purposes but but not for all. For many domain-specific (often less common and odd dimensioned) objects, downscaling will severely reduce recall. There is a reason that Roboflow slaps a license that is not open source on those specific architectures.

In some cases tiled inferencing (for example with https://github.com/obss/sahi ) might do the job.