Concepts: Our work here amounts to what can geometry and statistics tell us about the brain. We have an understanding of how an intelligent system could process an image sequence. The problems of 3D motion estimation, segmentation (motion boundaries), image motion and 3D structure estimation happen in a synergistic manner. Starting with the spatiotemporal derivatives of the image function, the system should be implementing at the early stages the positive depth constraint, and, at later stages the depth variability constraint. There is also considerable feedback. Let us concentrate here on the first constraint that can be invoked by using very simple measurements of image motion. It states simply that the scene in view is in front of the eye. Thus, if one searches for the 3D motion and computes depth using image motion measurements, the computed depth must be positive (or negative, depending on the coordinate system convention).

An important advance was marked by our discovery of global patterns in moving images that encode information about 3D motion. Specifically, consider two classes of orientation fields, as in these figures. The first represents orientations perpendicular to a pencil of lines. By changing the center of the pencil, a new field is created. We call such fields co-point fields. The second represents orientations normal to a class of conic sections. These conics come at the intersection of the image plane with a class of cones having their apex at the eye's nodal point and central axis any line passing from the nodal point. By changing that axis, a new field arises. We call such fields co-axis fields.

If one then considers the sign of the image motion along the orientations of any such field the result is a 2D array containing three symbols, "+", "-", and "0". The positive depth constraint does not allow these +'s, -'s and 0's to be anywhere but their locations are constrained by the underlying 3D motion. In particular, each pattern consists of positive and negative values that are separated by a straight line and a conic. Localization of these patterns provides information about the 3D motion. These figures explain this in detail. Thus, in this implementation, the problem of 3D motion estimation becomes a pattern recognition problem, where global patterns of measurements of the sign of image motion provide the desired information.

One can easily envision an architecture that, using neurons, implements a global decomposition of the normal motion field for locating the patterns mentioned above and thus constraining 3D motion. There is a very large amount of literature on the properties of neurons involved in motion analysis. The modules which have been found to be involved in the early stages of motion analysis are the retinal parvocellular neurons, the magnocellular neurons in the LGN, layer 4Cb of V1, layer 4B of V1, the thick bands of V2 and MT. These elements together are referred to as the early motion pathway. Among others they feed further motion processing modules, namely MST and FST, which in turn have connections to the parietal lobe. Here we concentrate on two striking features: the change of the spatial organization of the receptive fields and the selectivity of the receptive fields for motion over the early stages of the motion pathway. The computational modeling of the visual motion interpretation process that we described above appears consistent with our knowledge about the organization and functional properties of the neurons in the early stages motion pathway of the visual cortex. In addition our computational theory creates a hypothesis about the way motion is handled in the cortex and suggests a series of experiments for examining it.

This figure shows an outline of the process to be explained which involves four kinds of cells with different properties. In the early stages, from the retinal Pa ganglion cells through the magnocellular LGN cells to layer 4Ca of V1 the cells appear functionally homogeneous and respond almost equally well to the movement of a bar (moving perpendicularly to its direction) in any direction (Figure a). Within layer 4C of V1 we observe an onset of directional selectivity. The receptive fields of the neurons here are divided into separate excitatory and inhibitory regions. The regions are arranged in parallel stripes and this arrangement provides the neurons with a preference for a particular orientation of a bar target (which is displayed in the polar diagram) (Figure b). In layer 4B of V1 another major transformation takes place with the appearance of directional selectivity. The receptive fields here are relatively large and they seem to be excited everywhere by light or dark targets. In addition, these neurons respond better or solely to one direction of motion of an optimally oriented bar target, and less or not at all to the other (Figure c). Finally, in MT neurons have considerably large receptive fields and in general the precision of the selectivity for direction of motion that the neurons exhibit is typically less than in V1 (Figure d).

Thus, Neurons of the first kind could be involved in the estimation of the local retinal motion perpendicular to the local edge (normal flow). Neurons at this stage could be thought of as computing whether the projection of retinal motion along some direction is positive or negative. Neurons of the second kind could be involved in the selection of local vectors in particular directions as parts of the various different patterns discussed in the previous section, while neurons of the third kind could be involved in computing the sign (positive or negative) of pattern vectors for areas in the image; i.e., they might compute for large patches of different sizes, whether the normal flow in certain directions is positive or negative. Finally, neurons of the last kind (MT and MST) could be the ones that piece together the parts of the patterns developed already into global patterns that are matched with pre-stored global patterns. Matches provide information about egomotion and mismatches provide information about independent motion. Specific experiments could investigate this hypothesis.

Evidence for such a process comes also from a class of illusions that we discovered. This figure has the property that, upon extended viewing, movement is perceived inside the bands. These bands have the property that they are along coaxis fields. If they are not along copoint or coaxis fields, then the illusion is not experienced.

Finally, our studies on the distortion of human visual space have explained, through our geometric framework, a large number of illusions and have predicted many more. See the references.

Language and Thought: See the dialogue, day 7.

Return to the front page

Revised 1999/04/20
Send questions about these Web pages to