3D shape: Perhaps the most defining test of how well 3D motion is recovered is the estimation of shape. The reason for this is that many tasks can be achieved with somewhat or slightly incorrect 3D motion, but an error of a few pixels (for example, in the location of the translation direction) is enough to create significant errors in the estimated shape. How well shape models can be estimated depends on a number of factors besides accurate 3D motion estimates, such as the number of frames utilized (amount of data) and the actual representation of the model. For synthetic data the results are close to perfect. For example, for the well known Yosemite sequence, the extracted shape is shown here in the form of a mesh, and here with painted texture. Only one normal flow field was used. Video 6 shows a model for the scene described before recovered from a few frames and without any elaborate data structures; the scene is simply a set of 3D points. A bit more sophistication in representing the scene (triangles) results in much better models in our approach. See, for example, this reconstruction obtained from one flow field. This sequence shows the parts of the recovered model not containing discontinuities. More frames also result in better models. Video 7 shows an original sequence and Video 8 the obtained reconstruction. Videos 9 and 10 show another example. No post-processing was performed here but, clearly, graphics post-processing further improves the results. Finally, consider a reconstruction from multiple videos (Video 11, Video 12, Video 13). Video 14 shows the recovery, almost perfect. Again, no post-processing was performed.
Motion segmentation: This is the hardest problem in dynamic scene analysis; our approach was conceived with this problem in mind. Video 15 shows an original, well known sequence. An elaborate optimization scheme with feedback starts from the normal flow values and builds representations of camera motion and localizations of motion and background boundaries. The principle of depth variability plays a central role. Video 16 shows recovered inverse depth for a part of the sequence with the gray-level value showing the amount of depth (white denoting large positive values, i.e., close to the camera, and black denoting negative values). Notice the high variability of depth at the locations of independent movement. Also, notice that, at times, the train motion is consistent with the camera motion (making independent motion detection difficult) so no high variability of depth is obtained, but the depth comes out negative, marking independent motion. Notice in Video 17 the depth variability measurements for the same part of the sequence (white denoting large values). The procedure searches for the camera motion and the motion boundaries. Depth variability is the basis for the solution.