I have a 5090 machine sitting idle that I'm considering turning into a machine for my own small team (3 devs).

Are you willing to share any lessons learned, etc. that I could make use of? We are evaluating paying for a SOTA sub or trying this, and the talk about Qwen3.6-27B makes me want to try deploying this machine.

Sell the machine for $4K, use it to pay for Codex Pro for everyone for a year. Everyone will be significantly more productive and happy.

It's not even a real comparison if they are actually using them for coding.

If you are deploying always running agents (e.g. monitoring logs and services) then sure - a QWEN local server is a good choice. But for coding the cost in productivity of using a lower performing model is way too high.

The 5h quota of Codex Pro on GPT 5.4 Medium lasts me for around an hour and a half, maybe 2 hours. And this is already the "savy" setup. Enable GPT 5.5 High fast and you will be beached in 30 minutes with active development.

For continues all day work you definitely need a higher tier sub level.

I'm actually looking into deploying a GPU at my company because we can not give out our code. Qwen 3.6 looks good

this might be true for the plus account. For the "Pro" tier ($100-$200/month) the 5h limit is never a problem.

Right, I did swap that. Still, you have to pay that 4k then every year and give out the code. I also assume that prices will go up as no AI company (but NVIDIA -> selling shovels) is currently making any money.

For some projects the giving out the code part might be ok (i use Codex there too) but for the core app at the company I'm working at there is currently a strict no-AI policy. A local GPU solves this.

Anyone who frivolously suggests throwing away possible independence in favor of dependence on a Silicon Valley company is either incredibly naïve or acting in bad faith.

Not necessarily so. I can see how a bid to predict how thing will be in 1 year in AI-based coding is likely a losing one. So the idea is to extract the maximum value now, and turn it into profits that would buy you whatever is adequate for the next steps. For comparison, the AI-based coding landscape a year ago, in May 2025, wasn't even close to what we have now, and half the key tools did not exist.

OTOH, as we see, the larger models demonstrate diminishing returns, smaller models demonstrate improvements, and hardware does not show any signs of becoming cheaper, so holding on existing decent GPUs may, too, be a winning strategy in longer term.

I'll choose not to respond to your personal attack.

But in term of actually running a dev team - you are free to use QWEN or another quantized local model that can run on an RTX 5090 for coding if it makes you feel more independence. However you would struggle and spend many many more hours achieving the same thing, with a lot more debugging time, long delays before it's done, and many more prompts.

It's just not the right approach. I use QWEN and other local models all the time, but for more clearly defined monitoring and classification tasks.