Sure. Claude does that. "Cogitated for 1m 50s" doesn't work for real-time applications.

You can submit many queries in parallel to increase throughout. Smaller models and faster hardware can reduce the time per query too.

None of that gets you the 100ms response time the parent poster talked about, for something like "who is at my doorbell?" real-time uses.

Ok. Claude will not work for this use case because none of the sample data (weirdly blurry ID images) is in the training data.