There's nothing like getting a slap in the face from reality. (I mean the VR we all live in, of course).
I was plugging away trying to get this vision processing to go and it kept throwing up "wrong results". I kept hacking, trying to make the output fit my preconceptions. (Amusingly, this is exactlythe algorithm the vision s/w was trying to implement). It wouldn't fit. More hacking. More "bad output".
Finally I decided to actually measure things with a tapemeasure. Turned out my preconceptions were wrong and the s/w was right. :}
I'd 1/2-remembered the granny flat was 6m x 9m, but it turned out to be only 1/2 the size — 3m x 6m.
Anyway, let's look at the results of the present s/w.
As mentioned before, the idea I'm working on is a "top down" approach — trying to decide whether the scene you have matches your preconception.
There is a lot of info we know beforehand, when analysing the simple scene of a couple of joins near the ceiling. E.g. we assume the ceiling is 2.5m up (yes, I measured that and it's right!), that most corners are square, and the robot is probably close to the ground (in any case, that height can be assumed a priori because we built the robot).
It's a pity to waste that info and try to directly measure features from the scene assuming almost nothing a priori.
My s/w pans a small camera around and tries to capture the 4 corners of the room. It then internally makes a "wire model" of the room, and adjusts the parameters of length and width, as well as robot position and camera orientation, to decide whether those parameters actually appear to match what it can see in some of the scenes.
(In my granny flat I forced the camera to look at the "fifth corner" of the room, just to see what the s/w would make of it).
So instead of trying to "fit a line" into a bunch of pixels, the top-down approach requires you to come up with a "goodness of match" between the features you expect vs those you actually see. And most "correlation" measures will do, if you have some idea of how to decide whether the correlation is statisticall significant or not.
At first the s/w tried a "smart" approach in its parameter hunt. But it proved too smart and failed. So I went back to trying all possble parameters — e.g. room widths between 1m and 100m and robot positions at 1m gridpoints, and orientations in 10deg increments. Then find the "best fit" in the majority of scenes with a simple correlation between random locations in the model and assumed corresponding position in the scene.
The following are the "best fit" model it came up with. The red lines are hacked in by a badly-written bit of s/w — I am yet to find a proper tool that can insert some text or nice, thick red lines in a jpeg file — to show what features are expected according to the wire-frame model.
Corner 1 (from before):
Processed image with expected ceiling lines added:
Corner 2 (booshelving gives a false position, but is a better match than any other to the gridpoint):
Note the vertical line is about 1 ft in front of where the real wall is. Those damn books!
Corner 3:
It might seem amazing the s/w wasn't fooled by the vertical on the doorway. But we're using the info from all corners, here, so it's effectively filtered out along with all those coloured pixels.
Corner 4:
Again, there were all sorts of verticals in those dangling wires and the bookshelf itself, but they were ignored because we're matching a simple "preconception" to an image, rather than operating from almost-complete ignorance on the single scene.
Finally, Corner 5:
Since the model only has 4 corners, this funny add-on doesn't match very well. But it actually matches well enough to add some certainty to the match.
The actual degree of matching was nearly 80% on 3 of the corners, about 50% on the corner with the bookshelf and clock, and 10% on problematic corner #5.
Without any fiddling, this was the only statistically-significant match at the 80% level, saving any embarrasment on having to choose randomly between >= 2 interpretations (i.e. the robot analogy of an optical illusion
.
The model that produced this match was:
Which matches the actual room and actual robot position to within 10 cm.
The info took the pan on the camera about 20 secs to capture, and the PC about 0.1 sec to process.