It sounds like the police mistook it as well:
> “They showed me the picture, said that looks like a gun, I said, ‘no, it’s chips.’”
So AI did the initial detection, but police looked at it and agreed. We don't see the image, but it probably did look like a gun because of a weird shadow or something.
Fundamentally, this isn't really any different from a person seeing someone with what looks like a gun and calling the cops, only it turns out the person didn't see it clearly.
The main issue is just that with increased numbers of images, there will be an increase in false positives. Can this be fixed by including multiple images, e.g. from motion of the object, so police (and the AI) can better eliminate false positives before traumatizing some poor teen?
Picture? Images? But those are just frames of footage the cameras have captured! Why would one purposefully use less information to make a decision rather than more?
Just put the full footage in front of an unbiased third party for a multi-stage verification first. The problem space isn't "is that weird shadow in the picture a gun or not?" it's "does the kid in the video have a gun?". It's not hard to figure out the difference between a bag of chips and a gun based on body language. Presumably the kid ate chips out of the bag? Using certain motions that one makes when doing that? Presumably the kids around him all saw the object in his hands and somehow did not react as if it was a gun? Jeez.
Those are a lot of "presumablies". Maybe you're right. Or maybe it was mostly obscured so you really couldn't tell. How do you know it was open and he was eating? How do you know there were other kids around and he wasn't solo? Why do you think the body language would be so different? Nobody is claiming he was using a gun or threatening anyone with it. If you're just carrying something in your hand, I don't know how you could tell what the object is or isn't from body language.
It wasn't open and he wasn't eating. The AI flagged a bulge in his pants pocket, which was the empty, crumpled up bag that he put in his pocket after finishing eating all the chips.
This is quite frankly absurd. The fact that the AI flagged it is bonkers, and the fact that a human doing manual review still believed it was a gun... I mean, just, wow. The level of dangerous incompetence here is staggering.
And I wouldn't be surprised if, minutes (or even seconds) before the video frame the AI flagged, the full video showed the kid finishing the bag and stuffing it in his pocket. AIs suck at context; a human watching the full video would not have made the same mistake. But in mostly taking the human out of the loop, all they had for verification was a single frame of video, captured as a context-free still image.
It is frankly mind-boggling that you or anyone else can defend this crap.
> The AI flagged a bulge in his pants pocket
It's not totally clear -- we haven't seen the picture. The point is, it seemed to look like a gun. Shadows and reflections do funny things. For you to say with such confidence that this is absurd and bonkers, is itself absurd without us seeing the image(s) in question.
> It is frankly mind-boggling that you or anyone else can defend this crap.
That's not appropriate. Please see HN guidelines:
https://news.ycombinator.com/newsguidelines.html
> So AI did the initial detection, but police looked at it and agreed. We don't see the image, but it probably did look like a gun because of a weird shadow or something.
Not sure I agree. The AI flagging it certainly biased the person doing the manual review toward agreeing with the AI's assessment. I can imagine a scenario where there was no AI involved, just a human watching that same surveillance feed, and (correctly) not seeing anything alarming in it.
Also I expect the AI completely failed at context. I wouldn't be surprised if the full video feed, a few minutes (or even seconds) before the flagged frame, shows the kid crumpling up the empty Doritos bag and stuffing it in his pocket. The AI probably doesn't keep all that context around to use when making a later decision, and giving just the flagged frame of video to the human may have caused them to miss out on important context.
Computer says it looks like a gun.
https://en.wikipedia.org/wiki/Computer_says_no