Visual Processing
eBook - ePub

Visual Processing

Computational Psychophysical and Cognitive Research

  1. 152 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Visual Processing

Computational Psychophysical and Cognitive Research

About this book

This highly original and interesting monograph puts forward ideas on visual processing and representation in the early stages of visual perception, and examines the computational requirements of the system and its psychological performance.

Initially the author considers the computational theory of how the maximum amount of useful information about the scene can be registered from the variations in light intensity in the retinal image. He then goeson to address the question of just what it means to say that the visual system measures spatial aspects of the retinal image, and the consequences of the inevitable distortions that are introduced. He believes that the calculation of spatial position within a distorted metric is not trivial and requires dynamic processes with memory and control. Finally, Dr. Wan argues that the strength of the link between the low-level approaches of psychophysics and computational theory and high-level approaches of cognitive visual function lies in the logic of the arguments that indicate the computational need for control. This Essay will be of great interest to researchers in computer vision, perception, cognitive science and cognitive psychology.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Visual Processing by Roger Watt in PDF and/or ePUB format, as well as other popular books in Psychology & Cognitive Psychology & Cognition. We have over one million books available in our catalogue for you to explore.

Information

1

Introduction

Light is the freely available messenger that allows us to sense remote objects in our environment without the need to interact with them directly. This is the modern view of vision that began with the Persian philosopher and physicist, Alhazen (or more properly, abu-‘Ali A1 Hasen ibn A1 Haytham, 965–1039). Vision would have been easier to understand if the older view that light is emitted by the eye as a type of feeler, had been correct. Laser range finders work on this principle and are at present the only flawless way of measuring depth with light images. In the same way the colour of a surface, i.e. its reflectance, is easily computed if you know the position and nature of the source of light and the orientation of the surface.
The more difficult modern view, however, is the accepted version, and much of the rest of this essay will be concerned with the effects of unknown sources of light on an unknown arrangement of surfaces in the scene. Light is an uncertain messenger: It is not like a nice steady weight that can be reliably measured; it is a stream of random and largely independent particles, the photons, each of which has its own characteristic energy. The rate of arrival of photons at the eye is variable; we call this photon noise, and the variability depends on the mean rate. The mean rate of arrival is the intensity of the light. It is usually expressed as intensity per unit area, illuminance, which is similar to luminance, the intensity per unit area of emitted light. To avoid these cumbersome photometric units, I shall use the term grey-level to refer to the illuminance of the retina at an arbitrary small area.
In vision, measurements of the intensity of light sources are not of interest.
The source is incidental; the major interest is the disposition of reflecting surfaces in three-dimensional space. Light from the sources is reflected at surfaces, perhaps many times, and some eventually enters the eye. The grey- level or intensity at a particular place in the retinal image is determined by the output and positions of the sources, the reflectance, depth, and orientation of the surface imaged at that point with respect to the sources and the observer, and any mutual illumination of surfaces (i.e. light reflected from one surface to another, which is then illuminated directly from the source and also indirectly via the first surface).
If one knows all these details, then the grey-level at that particular point in the image can be calculated. The problem in vision is that these processes cannot be reversed: The grey-level on its own does not distinguish between the various factors causing it. In principle, any given retinal image could arise from an infinity of possible scenes, including a flat uniform surface illuminated by a patterned light source (the principle behind slide and cine projection). In practice, of course, we are very rarely faced with any operational ambiguity. The visual system generally manages to make a choice concerning the scene, and it is usually correct. This choice is made on the basis of assumptions concerning the most likely types of scenes. The scenes that we inhabit are generally constrained.

OUTPUT REQUIREMENTS OF VISUAL PROCESSING

Vision exists so that we can see what to do. Ultimately visual tasks require a full scene description in terms of the visible bodies, their shapes and sizes, their positions and motions, and their surface colours and markings. We are a long way from understanding how this is done, but we can, for simplicity’s sake, break the process down into a number of sub-processes.
Marr’s analysis of the architecture of low-level vision is currently the most widely used (see Marr, 1982), even though there are doubts about many details. Marr divided the process of vision down into three sub-processes, each of which delivers a representation for the next sub-processes. The three representations may be summarized as:
Primal Sketch:
A two-dimensional representation of significant grey-level changes in the image.
2.5D Sketch:
A partial three-dimensional representation recording surface distances from the observer.
Solid-model based representation:
A fully worked out volumetric representation of bodies in the scene.
There are significant concepts in this simple framework. A representation is a symbolic descriptor. It builds a description from a finite alphabet of primitive symbols (such as “edge”, “bar”, “corner”), each having an associated attribute list (recording: size, orientation, contrast, for example) and a grammar or set of rules that will exactly and exclusively generate all valid sentences or scene descriptions. A sketch is the process that produces, analyses, and represents the data.
This essay is concerned only with the Primal Sketch, the reason being that there are several psychophysical and psychological studies that indicate that the Primal Sketch is far from dull and straightforward. Whereas Marr tended to regard it as an inflexible, automatic, memoryless process producing something rather like an edge map, I shall describe some evidence that points to high-level control and memory very early in the process. It will be argued that many of the visual attention phenomena have their roots, trunk, and some branches in the Primal Sketch, and that there is a particularly striking simplicity to the machinery that belies a wonderfully rich diversity of function.

Output Requirements of the Primal Sketch

Why have a Primal Sketch? Why not just have a 2.5D Sketch as the first stage? The motive for the existence of a separate Primal Sketch in Marr’s work is relatively simple. The image itself has a great deal of information that is irrelevant to the 2.5D Sketch, and the Primal Sketch can be used to provide an economical representation. A second reason is that many of the computational problems in the 2.5D Sketch, such as stereomatching, would be hopelessly confounded by grey-levels rather than, for example, edge tokens. To these two reasons, one can add the obvious statement that many visual processes require a representation of the grey-level changes, not a 2.5D Sketch. Imagine how text would have to be written if we had no access to a Primal Sketch.
It seems reasonable to require that the Primal Sketch squeeze as much meaningful information out of the image as possible. Only part of this will be relevant for the 2.5D Sketch. One ultimate goal of vision is an understanding of the layout of bodies in three-dimensional space. The term body is used here to refer to a compact solid mass that remains coherent, at least over the time scale of perception. Our perceptual understanding of the scene is going to be in terms of objects, which do not necessarily correspond to bodies. A tree in winter is one body, but may be represented as a hierarchy of objects: its overall bulk, the trunk, and largest boughs, or individual twigs. The term object refers to a unit of perception. Bodies cause the input to vision; objects cause behaviour that is the output of vision.
The main task of the Primal Sketch is therefore to extract from the image all the relevant information about the layout and character of visible surfaces and to construct a convenient representation. Ideally the representation at this level of processing will be used in turn for all other subsequent processes, and so it must be a rich source of information. The Primal Sketch representation will be used to construct a depth representation, using for example information about occlusions, shading, texture gradients, and disparity differences (if there are two Primal Sketches, one per eye). The interesting parts of images concern the locations where surfaces become occluded, especially where surfaces occlude themselves by turning away from the observer, and the locations where surfaces bend even though they may remain visible, such as sharp creases.
Figure 1.1 shows a ground plan for a scene and marks the position of an observer, O. The scene has three walls, W1W4, within which there are five bodies: an upright circular cylinder, C; two boxes with rectangular crosssections, S, R; an upright block with triangular cross-section, T; and a sphere, B. Various lines-of-sight from the observer are also shown and each reaches a point of particular interest, which we might require the Primal Sketch to identify. For example, the cylinder occludes itself at points C, and C2. There is no sudden change in the character of the cylinder surface here, but to the observer, these two points will appear to be distinctive as the edges of the cylinder because there is a discontinuity in surface depth from the observer. Very often such occluding edges also correspond to discontinuities in surface orientation, i.e. creases or comers, such as points R1, R3, and W1. Creases and comers can also be imaged so that they do not correspond to the occluding edges of objects, as at points T2 and R2.
FIG. 1.1. A plan of an imaginary scene. The image formed by the observer at O will contain segments corresponding in sequence to the various visible surfaces. These segments are bounded by the lines-of-sight that are drawn from O to each point of surface occlusion or surface creasing. The problem for the Primal Sketch is to discover these lines-of-sight from the image and then to represent their spatial relations and their nature or probable cause (i.e. occlusion or crease).
The top of Fig. 1.2 shows the equivalent range or depth map for the observer at point O. This might be a useful precursor to a full reconstruction of Fig. 1.1 because it records the distance from O to the nearest reflecting surface in each direction. Figure 1.2 also shows the variation in the orientation of the visible surfaces in the scene with respect to the observer at O. Surface orientation is important because it determines the surface luminance: Surfaces that are head-on to the source of light have a higher illumination level per unit surface area than those oblique to the source. Notice that the lines-of-sight from the observer to occluding edges correspond to sudden changes in range from the observer and sometimes also in orientation. The lines-of-sight to creases, such as T2 and R2, correspond to abrupt changes in surface orientation, but not in range.
FIG. 1.2. A range map (top) and surface orientation map (bottom) for the scene of Fig. 1.1 from the point O. The different surface segments shown in that figure now correspond to areas where range and surface orientation change smoothly, and the segment boundaries are ...

Table of contents

  1. Cover
  2. Half Title
  3. Title Page
  4. Copyright Page
  5. Dedication
  6. Table of Contents
  7. Acknowledgements
  8. Preface
  9. 1. Introduction
  10. 2. A Model for the Primal Sketch
  11. 3. Measurements, Metrics, and Distortions
  12. 4. Calculating Values for Spatial Position with Grouping
  13. 5. Control of Primal Sketch Processing
  14. 6. Synopsis: Low-level Vision as an Active Process
  15. References
  16. Indices
  17. Subject Index