You can submit many queries in parallel to increase throughout. Smaller models and faster hardware can reduce the time per query too.
You can submit many queries in parallel to increase throughout. Smaller models and faster hardware can reduce the time per query too.
None of that gets you the 100ms response time the parent poster talked about, for something like "who is at my doorbell?" real-time uses.
Ok. Claude will not work for this use case because none of the sample data (weirdly blurry ID images) is in the training data.