I don't really see how the vacuum can effectively clean a whole room or flat using only a CNN of the current image in front of the robot. This would help detect obstacles, but a bumper sensor would do that as well.
All but the most basic vacuum robots map their work area and devise plans how to clean them systematically. The others just bump into obstacles, rotate a random amount and continue forward.
Don't get me wrong, I love this project and the idea to build it yourself. I just feel like that (huge) part is missing in the article?
https://opencv.org/structure-from-motion-in-opencv/
Not saying that it’s viable here to build a world map since things like furniture can move but some systems, e.g. warehouse robots do use things like lights to triangulate on the assumption that the lights on the tall ceiling are fixed and consistent.
The classic Roombas from a decade or so ago worked without any sort of mapping or camera at all -- they basically did a version of the "run and tumble" algorithm used by many bacteria -- go in one direction until you can't any more then go off in a random new one. It may not be efficient but it does work for covering territory.
I think the only reason for mapping is to be able to block off 'no go' areas (no escaping out the front door!) and to be able to go home to the charger.
For the actual cleaning, random works great.
Surely mapping also helps reducing the time it takes to achieve the task?
You are right. The original Roomba was discussed on HN 3 months ago:
https://news.ycombinator.com/item?id=46472930
Apart from just detecting obstacles, we wanted to build a robot which is intelligent enough to take in semantic cues like this is a doorway so I can go through it, or this is a kitchen I can clean it this way and so on
It navigates by Brownian motion.