> The ideal case would be something that can be run locally, or at least on a modest/inexpensive cluster.
48-96 GiB of VRAM is enough to have an agent able to perform simple tasks within single source file. That's the sad truth. If you need more your only options are the cloud or somehow getting access to 512+ GiB