Advances in Computer Vision
eBook - ePub

Advances in Computer Vision

Volume 1

  1. 200 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Advances in Computer Vision

Volume 1

About this book

First published in 1988. The series Advances in Computer Vision has the goal of presenting current approaches to basic problems that arise in the construction of a computer vision system, written by leading researchers and practitioners in the field. The first two volumes in the series comprise seven chapters, which together cover much of the scope of computer vision. This is Volume I.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Advances in Computer Vision by C. Brown,Christopher Brown in PDF and/or ePUB format, as well as other popular books in Psychology & Cognitive Psychology & Cognition. We have over one million books available in our catalogue for you to explore.
1 THE VISIONS IMAGE-UNDERSTANDING SYSTEM
A. HANSON
E. RISEMAN
University of Massachusetts
In this chapter we consider some of the problems confronting the development of general integrated computer-vision systems and the status of the VISIONS project, which has become an experimental testbed for the construction of knowledge-based image interpretation systems. The goal is the construction of a symbolic representation of the three-dimensional world depicted in a two-dimensional image, including the labeling of objects, the determination of their location in space, and to the degree possible, the construction of a surface representation of the environment.
Our system involves three levels of processing for static image interpretation. Low-level processes manipulate pixel data and produce intermediate symbolic events such as regions and lines with their attributes. High-level processes focus attention on aggregates of these events via rule-based object hypotheses in order to selectively invoke schemas, which contain more complex knowledge-based interpretation strategies. Intermediate-level processes carry out grouping and reorganization of the error-prone symbolic representation extracted from the sensory data, utilizing both ā€œtop-downā€ control of the processing by the schema interpretation strategies as well as ā€œbottom-upā€ data-directed organization of interesting perceptual events. Our design is being extended to integrate the results of motion and stereo processing throughout the three levels of processing, with depth arrays at the lowest level, partially correct surfaces at the intermediate level, as well as two-dimensional and three-dimensional motion attributes where appropriate.
In addition, a highly parallel three-level associative architecture is being developed to achieve real-time dynamic vision capabilities by allowing a bi-directional flow of information and processing across the levels. The machine will employ MIMD processing at the high level and Multi-SIMD (i.e., parallel local SIMD) under MIMD control at both the intermediate and low levels.

INTRODUCTION

This chapter discusses a range of problems in computer vision and describes the VISIONS system,1 an approach to the construction of a general vision system that relies on knowledge-based techniques for image interpretation. The goal of this effort is the construction of a system capable of interpreting natural images of significant complexity. Over the past 12 years, the VISIONS group at the University of Massachusetts has been evolving the system while applying it to natural scenes, such as house and road scenes, aerial images, and biomedical images. Our philosophy is that it is reasonable to expect that each new domain will require a different knowledge base, but most of the system should remain the same across task domains. This paper documents the status of the system in mid-1986 as its development continues. Note that we do not attempt to carefully survey the literature on knowledge-based vision; partial reviews may be found in [16, 17, 29, 31, 54] and some representative individual research efforts are described in [13, 15, 18, 19, 32, 42, 48, 52, 63, 70, 82, 88, 89, 92, 93, 99, 100, 104, 107, 110, 119, 122, 123, 128, 133, 134].

The Problem of Image Interpretation

Our research has concentrated to a large extent on the identification of objects in static color images of natural scenes by associating two-dimensional image events with object descriptions [19, 20, 21, 53, 54, 55, 56, 57, 58, 59, 60, 61, 72, 106, 109, 111, 113, 114, 115, 116, 117, 118, 148, 149, 150]. However, at a more general level, the VISIONS design utilizes many stages of processing in the transformation from ā€œsignalsā€ to ā€œsymbols,ā€ or to use more specific terminology, from two-dimensional image events to object labels and three-dimensional hypotheses. Unless the domain is extremely simple and heavily constrained so that object matching processes can be applied directly to the image (e.g., via template matching), there must be some form of sensory processing that extracts information from an image to produce an intermediate representation.
This representation must then be refined and associated with semantic events by making use of general knowledge and constraints provided by the physical world represented in the scene domains of interest. This implies that the intermediate events must be mapped to hypotheses about the content and structure of the scene. Once a partial mapping is achieved a variety of interpretation strategies can be employed for matching, verifying, and refining contextual structures, and for grouping incomplete and ambiguous image data into organized perceptual structures.
The construction of a three-dimensional representation of the environment from a static image is an additional aspect of our work. We have made some progress toward this goal (55, 98, 149, 159, 160), but it remains a difficult problem. While the results presented in this chapter focus more on the problem of object labeling in a static monocular two-dimensional image, the system presented here can be naturally extended to include stereo and/or motion analysis of multiple images. The presence of three-dimensional data significantly improves the ability to develop a three-dimensional representation of surfaces and would make the interpretation process easier and more robust, although the problem is still extremely challenging. Thus, our current system is being extended to use depth maps of the visual field produced by motion and stereo algorithms as part of the three-dimensional interpretation process.

Ambiguity and Context

The complexity of the visual task can be made explicit by examining almost any image of a natural scene. Even though it is very difficult to be introspective of one's own visual processing, we believe the conjectures and qualitative discussion in this subsection are intuitive and reasonable.
Humans are rarely aware of any significant degree of ambiguity in local portions of the sensory data, nor are they aware of the degree to which they are employing more global context and stored expectations derived from experience (26, 27, 131, 132). However, if the visual field is restricted so that only local information about an object or object-part is available, interpretation is often difficult or impossible. Increasing the contextual information, so that spatial relations to other objects and object-parts are present, makes the perceptual task seem natural and simple. Consider the scenes in Figure 1.1 and the close-up images in Figure 1.2. In each case subimages of objects have been selected which show:
• ā€œprimitiveā€ image events—image events which convey limited information about the decomposition of an object into its parts (which of course is a function at least partly of resolution); note that this implies that the path to recognition of the object via subparts is not available to our perceptual system;
• absence of context—information about other objects which might relate to the given object in expected ways is limited; note that this implies that the path to recognition of the object, via the scenes or objects of which it is a part, is not available to our perceptual system.
image
image
FIGURE 1.1. Original images. These images are representative samples from a larger data base that is used in our experiments. All are 256 Ɨ 256 pixels in spatial resolution; the color resolution is 8 bits in each of the red, green, and blue components, except h, which is monochromatic.
In Figure 1.2, as some of the surrounding context of the shoes and the head are supplied, the perceptual ambiguity disappears and the related set of visual elements is easily recognized. In each of the cases illustrated in Figures 1.1 and 1.2 the purely local hypothesis is inherently unreliable and uncertain, and there may be little surface information to be derived in a bottom-up manner. It appears that human vision is fundamentally organized to exploit the use of contextual knowledge and expectations in the organization of the visual primitives. However, it may be impossible to associate object labels with these ambiguous primitives until they are grouped into larger entities and collectively interpreted as a related set of object or scene parts (28). Thus, the inclusion of knowledge-driven processes at some level in the image interpretation task, where there is still a great degree of ambiguity in the organization of the visual primitives, appears inevitable.
We conjecture that image interpretation initially proceeds by forming an abstract representation of important visual events in the image with-out knowledge of its contents. These primitive elements will be associated with tokens (forming a symbolic representation of the image data), which are then collected, grouped, and refined to bring their collective description into consistency with high-level semantic structures that represent the observer's knowledge about the world.
image
FIGURE 1.2. Closeup subimages from original images. In many cases, the identity or function of an object or object part cannot be determined from a small local view. Only when the surrounding context becomes available can the objects be recognized.

Issues in Image Interpretation

There are a variety of issues that must be addressed if robust vision systems are to be developed. We have identified some of the key problems that we believe are critical to the interpretation process and which are addressed in the following sections. The reader should note, however, that we are not attempting to discuss all of the issues that are open in vision, since it would lead to a chapter far longer than this one. Thus, in summary, we have abstracted the following set of overlapping problems that must be faced:
• Reliability of Segmentation Processes. Segmentation, or low-level, processes vary in quite unreliable and unpredictable ways due to a variety of confounding factors, which include complex uncontrolled lighting, highlights and shadows, texture, occlusion, complex three-dimensional shapes, and digitization effects.
• Uncertainty. There is inherent uncertainty in every stage of processing; as the data is transformed it must be kept constrained in a way that preserves some degree of integrity and avoids large search spaces.
• Information Fusion. Mechanisms are needed for integrating multiple sources of information, including multiple sensory sources, multiple representations output from low-level processes, and evidence from multiple knowledge sources.
• Representation. It is difficult to represent natural objects and scenes that exhibit a tremendous variability in their three-dimensional shape, color, texture, and size and, hence, in many of their measurable characteristics in the two-dimensional image. Yet such a representation is clearly needed in order to capture the structural, geometric, and spatial information required for visual perception.
• Local vs. Global Processing in Matching. There is a general need to provide information about the global context to the more localized processes attempting to reach some decision about a portion of the image; the process of hypothesis generation must work in the face of incomplete and inconsistent information.
• Inference. Computational machinery is needed for assessing the indirect implication of direct, but uncertain, evidence; this inferencing capability must be able to deal with all of the aforementioned issues.

Difficulties with Segmentation

Probably the most fundamental problem blocking knowledge...

Table of contents

  1. Cover
  2. Half Title
  3. Title Page
  4. Copyright Page
  5. Table of Contents
  6. Contributors
  7. Introduction
  8. 1 The Visions Image-Understanding System
  9. 2 Robust Computation of Intrinsic Images from Multiple Cues
  10. 3 Image Flow Theory: A Framework for 3-D Inference from Time-Varying Imagery
  11. Author Index
  12. Subject Index