Studying why we see in 3D is essential to understand the basics of stereoscopic filmmaking and why some 3D images look better than others. The brain mechanisms of vision remained mysterious for a long time, and even today the very foundations of stereoscopic vision are still partially unknown.

Basics of 3D Vision

Many animals have two eyes just like us, including most mammals whose field of vision has a central area where the separate fields of both eyes overlap. It is within this overlapping area that we are able to see in 3D. This is called binocular vision.
Having binocular vision has many benefits: Experiments show that our ability to detect and recognize shapes and objects is better when seen by both eyes, as in reading and any action requiring accuracy. Binocular vision is mandatory in order to be able to see in 3D, which is called stereoscopic vision.
Binocular vision alone is not enough to guarantee stereoscopic vision. In fact, it involves a large number of elements called “cues,” of which only two are related to binocular vision.
Figure 1.2 shows these so-called “stereoscopic cues” which will be described in more detail in the following pages.
fig1_1
Figure 1.1 Binocular vision
fig1_2
Figure 1.2 Binocular vision

Binocular Vision

The Retinal Disparity

The human eye has all the characteristics of an optical system, where images are formed upside down on the retina after passing through the lens.
fig1_3
Figure 1.3 The stereoscopic clues
fig1_4
Figure 1.4 Eye schematic
More specifically, the light rays coming from the object we are staring at (i.e., the object on which our eyes are converged) arrive at a very particular point on the retina called the fovea. This area of the retina comprises cone cells only and therefore has excellent optical resolution (see Figure 1.5).
Once the visual information is transmitted from the retina to the optic nerve, it is sent to the brain via the optic chiasm, where it is separated in two: The information from the left side of both eyes’ retinas is routed towards the left hemisphere, and vice versa. As the image on the retina is inverted, it means the “real” left visual field is interpreted in the right hemisphere, and vice versa.
In the visual areas of the brain, the way the information is processed is still partially unknown. The primary theory for the interpretation of depth refers to the technique of comparison: The brain looks for similarities within the information from the two retinas, and pairs the various objects that appear to be the same. Then, the brain analyzes the difference in position of the identified objects on each retina. This is called retinal disparity, which the brain then translates into distance, and therefore depth.
Reality is perceived differently depending on the horizontal separation of the eyes (called “interocular distance”) that provides two different visual points of view on objects. The closer an object is to infinity, the less the difference will be noticeable. In the example in Figure 1.6, the eyes are converging on the green object and the image is therefore projected onto the fovea of both eyes. The image of the yellow object that is further away is projected at two different locations on the retinas. The brain interprets this difference of coordinates to evaluate the distance between the two objects.
Stereoscopic vision is made possible by analyzing two different perspectives on reality, separated by a horizontal spacing or interocular distance (but not exclusively, as stated earlier). The mental process of interpolating depth from these differences is called stereopsis.
Stereopsis allows a detailed analysis of distances: Under optimal conditions, we are able to detect retinal disparities as small as 2 to 6 arc seconds (the equivalent of the ability to distinguish a difference of 4 millimeters in depth at a distance of 15 feet). Stereopsis can be facilitated by high luminance and the presence of detailed textures in the visual field.
According to some studies, two or three types of neurons are involved in binocular vision. Whitman Richards, a cognitive sciences specialist, described in 1973 several categories of people who can’t see in 3D: those who don’t see any 3D at all, those who don’t see in 3D only beyond the object they’re converging on, and those who don’t see in 3D only between themselves and the same object. He deduced the existence of “proximity neurons,” “distant neurons,” and probably an “area of vergence neurons.” More research is required until we fully understand the processes of vision and more specifically the stereoscopic vision.
fig1_5
Figure 1.5 Human vision
fig1_6
Figure 1.6 Retinal disparity

Vergence

Vergence movement of the eyes implies a coordinated contraction of the muscles in both eyes and is accompanied by a contraction of the ciliary muscle setting the proper accommodation for the distance at which the eyes are converging. There is a physiological link between the movements of accommodation and vergence.
The angle of vergence is one of the other cues used by our brain for stereoscopic vision: The wider the angle, the closer the object. This cue is particularly effective for short distances (less than 6 feet from the eyes).
Many psycho-optical reflexes influence vergence (automatic reflexes acquired during childhood), and are not consciously controllable. For example, there is the fixation reflex that rapidly rotates the eyes to a point of attention in the peripheral retina (hence the term “eye-catching object”). There is also the following reflex (eyes follow a moving object), the maintained fixation reflex (the head moves but the eyes stay converged at the same point), and the vergence reflex (the eyes stay converged on an object moving closer or farther away).
fig1_7
Figure 1.7 Accommodation and vergence

Diplopia and Horopter

Let’s focus now on the 120-degree field of binocular vision which is of particular interest to us. Stereoscopic vision depends on a fundamental phenomenon named diplopia: It is the non-fusion of objects in which retinal disparity exceeds a certain threshold.
If this sounds a bit complicated, a ...