Can you elaborate more? Discord has 656m users. if 10% upload their ID, they'd have 65m ID photos to search through. There are 2 use-cases here:
1/ Safety Bans (lets pretend 0.01% of ID card users have been banned for safety reasons: 650k accounts)
If a user submits their selfie/ID card, Discord needs to compare the new image with one of the 650k banned (but deleted?) images. I can't possible think how a human could remember the 650k photos well enough to declare a match.
Even if such a human existed with this perfect recall, there can't be very many of them on this planet to hire.
2/ Duplicate account bans
If a user registers, how can a support staff search the 65m photos without ML assistance to determine if this is a new user or a fraudster?
0.01% of 65M is 6,500. Also apparently only 70K people uploaded their IDs.
That being said, you can still hash faces and metadata (such as ID numbers) instead of storing the whole ID as a scanned photo, if the information is only used for duplicate checking. Hashing does not increase the racial bias. If your model has a bias it will always have a margin of error.
Models have racial biases, can't support aged faces, or look-alike faces.
You don't have to use ML models for this.
Can you elaborate more? Discord has 656m users. if 10% upload their ID, they'd have 65m ID photos to search through. There are 2 use-cases here:
1/ Safety Bans (lets pretend 0.01% of ID card users have been banned for safety reasons: 650k accounts)
If a user submits their selfie/ID card, Discord needs to compare the new image with one of the 650k banned (but deleted?) images. I can't possible think how a human could remember the 650k photos well enough to declare a match.
Even if such a human existed with this perfect recall, there can't be very many of them on this planet to hire.
2/ Duplicate account bans
If a user registers, how can a support staff search the 65m photos without ML assistance to determine if this is a new user or a fraudster?
If they can't handle that many users then they should close signups.
The product scales, but sfaely using users' data doesn't? Hardly an excuse.
0.01% of 65M is 6,500. Also apparently only 70K people uploaded their IDs.
That being said, you can still hash faces and metadata (such as ID numbers) instead of storing the whole ID as a scanned photo, if the information is only used for duplicate checking. Hashing does not increase the racial bias. If your model has a bias it will always have a margin of error.
Do you understand how image hashing works? You don't need machine learning just to check if two images are potentially identical.