You have to REALLY be into AI to do this for generation/API cost reasons (or willing to have this as a hacking project of the month expense). Even ignoring electricity, a 16 GB 5060 Ti is more expensive than 16,000 image generations. Assuming you do one every 15 seconds, that's 240,000 seconds -> more than 2 months of usage at an hour a day of generations.
If you've already got a decent GPU (or were going to get one anyways) then cost isn't really a consideration, it's just that you can already do it. For everyone else, you can probably get by just using things like Google's AI Studio for free.
>a 16 GB 5060 Ti is more expensive than 16,000 image generations
Sure, but now you get a good gaming GPU that you can write off as a business expense.
16,000? Where are buying your GPU, or API calls? If you don’t want to wait for a bargain then $450 will get you the GPU, and even at that price you’d only be able to buy about 10,000 standard-resolution image gen api calls. Do you do design? Editing? Touch up? You can easily blow through a few hundred api calls an hour: “Turn the stitching green… slightly less saturated… now make the stitches more ragged… a little more… now just slightly less”.
Clearly you’re looking at the task through the eyes of a hobbyist or “of the month” project so the workflow and pace may not be obvious but API budgets spend fast. Just look at the benchmarks in this article to see how many tried some of these changes took- 47, there goes $3 in 3 minutes, or half that time if your quick on the keyboard.
And even then! Well, you’re limited aren’t you? Limited to the Gemini model, or OpenAI, or whoever, and you see the limits of any one model in the article as well. Or you plonk down for a mediocre GPU with some slight VRAM headroom and choose from dozens of models, countless Lora, control nets, and other options, infinitely flexible in painting and outpainting. Ahead of that you’ll need to budget at least a dozen hours to learn local genai tools, comfyui or others. Then, for under a $1 dollar in electricity, you can can queue up a dozen ideas overnight and get 1,000 variations on each of them handed to you in the morning to quickly triage over coffee and email catchup.
It’s not a one size fits all market though, and most professionals are likely finding they want both: A low-cost, high-control, high precision sandbox that isn’t as fast or scalable as the api, and the api for when fast and scalable is what you need.
GPUs are needed for plenty of reasons. I assume plenty have a decent dGPU, even on laptops.
I have a 4080 RTX and Kontext runs great at fp8. I run several other models besides. If you want to get at all good at this, you need tons of throwaway generations and fast iteration and an API quickly becomes pricier than a GPU.
Precisely. Even inflated if the inflated 16,000 api calls was accurate for how much the cost of mediocre GPU would get you, that’s not an endless store of api calls. I’m also on a 4080 for lighter loads, and even just writing benchmarks, exploring attention mechanisms, token salience, etc, without image gen being my specific purpose I may trash half a thousand generations from output every few days. More if I count the stuff that never made it that far too.
The point is just having a "decent" dGPU isn't enough. Even at 16 GB you're already quantizing Flux pretty heavily, someone with a 4080 gaming laptop is going to be disappointed trying to work with 12 GB.