Hacker News

This is the useful AI stuf. There’s so many usecases this makes possible.

Right, and that's what I find frustrating. There are so many use cases where a local, purpose-built model that's dependably good at one thing would really make a difference. But no one is going to throw a billion dollars to give us amazing dust removal, flawless scene segmentation, etc.

Instead, you're supposed to upload it to the cloud and ask a big, multimodal frontier model to maybe please do the thing you want and nothing else.

Someone 12 hours ago [ - ]

> There are so many use cases where a local, purpose-built model that's dependably good at one thing would really make a difference. But no one is going to throw a billion dollars to give us amazing dust removal, flawless scene segmentation, etc.

iPhones have models for text extraction and in-painting in the Photos App.

Both don’t have knobs to tune them, but, I think, they are decent for their intended audience (definitely not flawless, but I don’t think that exists anywhere, even if dropping the ‘local’ requirement)

For scene segmentation, iOS has models for detecting persons (https://developer.apple.com/documentation/Vision/segmenting-...).

It also has models for detecting faces, face features, body and hand poses, or for picking the ‘best’ selfie from a set.

(And dust removal is fairly niche compared to these, I think. Or do I overlook some common use case for it that many people want?)

Yokohiii 10 hours ago [ - ]

I have the feeling that the cloud based providers are just using the freely available segmentation models. It's just speculation, but it doesn't seem to be top priority for them, so they'd just bolt on anything that works.

A problem is also that the cloud solutions need a complex UI to surface segmentation to the user. But the point you have there is that those models are probably not prime time ready yet, surfacing them would actually reveal they are not as powerful as the user expects. Destroying the illusion that AI can just do anything at will.

somenameforme 15 hours ago [ - ]

You can do all of this locally on a cheap video card. Search for fooocus or automatic1111 for a couple of setups that are fairly low friction to get going. Amuse AI is another one. It's not quite state of the art and also censored, but it's by far the least friction (especially if you have an AMD card) - it's pretty much plug and play. ComfyUI is the advanced do-everything workhorse. However, it's anything but comfy if you don't already have a lot of knowledge about this domain. I'd generally recommend fooocus for a balance between usability and power/flexibility.

The million image gen services online are mostly just making bank off ignorance. People don't realize that their own cheap video cards are more than enough to do everything they're paying a service an orders of magnitude markup for.

krackers a day ago [ - ]

The highest return small local model for me has been the in-built OCR that macOS has. It has finally "solved" OCR by making high-quality results accessible to everyone. Yet the state of art outside the apple ecosystem seems to be tesseract (poor results), or extremely heavy VLMs.

crimsonnoodle58 20 hours ago [ - ]

PaddleOCR? Qwen3-VL 30B-A3B?

doctorpangloss a day ago [ - ]

how many times have you edited a photo you took on your phone in the last 7 days?

stusmall a day ago [ - ]

I think 3? I feel like that's often enough. Sometimes it's nice to do a quick dumb ass gag on a whim. If I am anything I am a man who loves a dumb ass gag.

inigyou a day ago [ - ]

Good on you. I've laughed at many dumbass gags but I've only been a passive consumer of them.

stusmall a day ago [ - ]

Become the dumbass change you want to see in the world

inigyou 7 hours ago [ - ]

I'm not nearly creative enough.

gradientsrneat 4 hours ago [ - ]

Some smartphones have a feature that detects if you're taking a picture of a menu/letter/etc and will automatically crop and unskew it for you.

TeMPOraL a day ago [ - ]

Half a dozen at least.

(I'm counting only times I used generative editing options in my Galaxy phone - if I were to take your question literally, it would be "at least once every other day", simply due to rotating and cropping.)

dogomatic a day ago [ - ]

Personally, about 9 times. Would be higher if it was even easier and cheaper