I have a 4080 RTX and Kontext runs great at fp8. I run several other models besides. If you want to get at all good at this, you need tons of throwaway generations and fast iteration and an API quickly becomes pricier than a GPU.
Precisely. Even inflated if the inflated 16,000 api calls was accurate for how much the cost of mediocre GPU would get you, that’s not an endless store of api calls. I’m also on a 4080 for lighter loads, and even just writing benchmarks, exploring attention mechanisms, token salience, etc, without image gen being my specific purpose I may trash half a thousand generations from output every few days. More if I count the stuff that never made it that far too.
The point is just having a "decent" dGPU isn't enough. Even at 16 GB you're already quantizing Flux pretty heavily, someone with a 4080 gaming laptop is going to be disappointed trying to work with 12 GB.
I have a 4080 RTX and Kontext runs great at fp8. I run several other models besides. If you want to get at all good at this, you need tons of throwaway generations and fast iteration and an API quickly becomes pricier than a GPU.
Precisely. Even inflated if the inflated 16,000 api calls was accurate for how much the cost of mediocre GPU would get you, that’s not an endless store of api calls. I’m also on a 4080 for lighter loads, and even just writing benchmarks, exploring attention mechanisms, token salience, etc, without image gen being my specific purpose I may trash half a thousand generations from output every few days. More if I count the stuff that never made it that far too.
The point is just having a "decent" dGPU isn't enough. Even at 16 GB you're already quantizing Flux pretty heavily, someone with a 4080 gaming laptop is going to be disappointed trying to work with 12 GB.