> You probably want to try renting some time on a dedicated box with roughly the specs you’re considering and running the open models
You don't even need to go that far. For example, with Exoscale Dedicated Inference[1] you just point it at the Hugging Face for the model and quantisation you want to test and it automagically spits out an OpenAI-compatible API endpoint.
[1] https://www.exoscale.com/ai-cloud-infrastructure/dedicated-i...
(I have no relationship with Exoscale, this particular product just crossed my radar recently)
I think they're just suggesting renting as a way to test that the hardware they're considering purchasing would actually be able to do what they need.
> I think they're just suggesting renting as a way to test
Well, yes, I understood that.
Which is why I started with the words "You don't even need to go that far.".
To re-phrase what I said in clearer terms:
Instead of renting an instance, then messing around with configuring Linux and whatever via SSH or Ansible or whatever. Just point a Hugging Face link at this magic service and get a ready-to-go API back. Enabling you to test your desired model spec with minimum fuss.
Ultimately the guy wants his own hardware. So why waste time messing around with someone else's VM if you just want to test a specific model spec. That is the TL;DR.