Handbook of Visual Communications
eBook - ePub

Handbook of Visual Communications

Hseuh-Ming Hang, John W. Woods, Hseuh-Ming Hang, John W. Woods

Share book
  1. 518 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Handbook of Visual Communications

Hseuh-Ming Hang, John W. Woods, Hseuh-Ming Hang, John W. Woods

Book details
Book preview
Table of contents
Citations

About This Book

This volume is the most comprehensive reference work on visual communications to date. An international group of well-known experts in the field provide up-to-date and in-depth contributions on topics such as fundamental theory, international standards for industrial applications, high definition television, optical communications networks, and VLSI design. The book includes information for learning about both the fundamentals of image/video compression as well as more advanced topics in visual communications research. In addition, the Handbook of Visual Communications explores the latest developments in the field, such as model-based image coding, and provides readers with insight into possible future developments.

  • Displays comprehensive coverage from fundamental theory to international standards and VLSI design
  • Includes 518 pages of contributions from well-known experts
  • Presents state-of-the-art knowledge--the most up-to-date and accurate information on various topics in the field
  • Provides an extensive overview of international standards for industrial applications

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Handbook of Visual Communications an online PDF/ePUB?
Yes, you can access Handbook of Visual Communications by Hseuh-Ming Hang, John W. Woods, Hseuh-Ming Hang, John W. Woods in PDF and/or ePUB format, as well as other popular books in Informatique & Médias numériques. We have over one million books available in our catalogue for you to explore.

Information

Year
2012
ISBN
9780080918549
Chapter 1

Video Data Compression

B.G. Haskell Visual Communications Research Department, AT&T Bell Laboratories, Holmdel, New Jersey

1.1 Introduction

A considerable effort has been underway for some time to develop inexpensive transmission techniques that take advantage of recent advances in electronic technology as well as expected future developments. Most of the attention has been focused on digital systems because, as is well known, noise does not accumulate in digital regenerators as it does in analog amplifiers and, in addition, signal processing is much easier in a digital format.
Progress is being made on two fronts. First, the present high cost per bit of transmitting a digital data stream has generated interest in a number of methods that are currently being evaluated for cost reduction. While these methods have general applications and are not confined to a data stream produced by a video signal source, it is important to remember that video bit rates tend to be considerably higher than those required for voice or data transmission. The most promising techniques for more economical digital transmission include optical fibers, digital satellite, broadband ISDN, and digital transmission over the air, among others.
The second front on which progress is being made involves reducing the number of bits that have to be transmitted in a video communication system. Bit-rate reduction is accomplished by eliminating, as much as possible, the substantial amount of redundant information that exists in a video signal as it leaves the camera. The amount of signal processing required to reduce the redundancy determines the economic feasibility of using this method in a given system. The savings that accrue from lowering the transmission bit rate must more than offset the cost of the required signal processing if redundancy reduction is to be economical.
Present costs of digital logic and digital memory are low enough to make this type of signal processing economically very attractive for use in long distance videoconferencing links over existing facilities. Furthermore, it is expected that the cost of digital logic and memory will continue to decline. Therefore, it is conjectured by those knowledgeable in the field that signal processing for bit-rate reduction will have an important part to play in all video systems, and in many cases, it could become the overriding factor determining economic feasibility.
To transmit video information at the minimum bit rate for a given quality of reproduction, it is necessary to exploit our understanding of many branches of science. Ideally the engineer should have an appreciation of motion pictures, colorimetry, human vision, signal theory, display devices, and so on. As might be expected any individual can have only a smattering of knowledge on such a diverse range of topics, and a specialist in any one topic will readily confess to a certain amount of ignorance even in his or her chosen field. As engineers we are concerned with complex stimuli and their human perception, as well as the final utilization of the perceived information. Knowledge of these is often unavailable or sketchy, forcing us to design encoders based on a relatively primitive understanding of the problem. The limits of bit-rate compression will be approached, we believe, only as our knowledge of stimuli, perception, and utilization increases.
Thus, in opening a discussion of video bit-rate compression we are very aware of our own limitations. Our modest objective of defining the state of the art is, we are well aware, open to the criticisms of oversimplification, serious omissions, and factual disagreement. As for where the subject is heading and its inherent limitations, we confess myopia and will not be surprised by a discovery that could not have been extrapolated from existing thinking and known ignorances. But first let us set the stage for our discussion. The conventional representation of a digital communication link for the transmission of audio or pictorial information is shown in Fig. 1.1. The function of the source encoder is to operate on an analog of audio or pictures, x(t), and to convert it into a stream of binary digits, s(t). The source decoder at the receiver accepts a binary signal S(t) and produces a continuous signal X(t). It may not be necessary to ensure X(t) = x(t), but what does matter is that after transduction, e.g., loudspeaker or TV tube, X(t) should be perceived as x(t), subject to an acceptable quality criterion. Although x(t) does not always have to be identical to X(t), system engineers prefer s(t) = S(t); i.e., the channel appears ideal. Most practical channels contain dispersion, nonlinearities, additive noise, multipath fading, interference from other channels, and so on. These imperfections are overcome largely by preprocessing and postprocessing the binary signals s(t) and S(t) by the channel codec and terminal equipment. The transmitting terminal equipment operates on c(t) to produce (perhaps by conversion to multilevel, modulation, filtering, etc.) a signal f(t) that is suitable for combating the imperfections of the communication channel. The signal F(t) that emerges from the channel may differ considerably from f(t). After demodulation, a binary signal C(t) is regenerated using adaptive equalization of the channel and adaptive detection strategies. The binary signal C(t) is then channel and source decoded to produce S(t).
f01-01-9780080918549
Figure 1.1 Digital communication link for the transmission of audio or pictorial information.
The purpose of this book is to discuss mostly source encoding. However, Fig. 1.1 demonstrates that S(t) is dependent on the channel terminal equipment, the channel codec, and of course, the channel. Thus, encoding picture signals is not merely a source encoding problem, but may include the complete communication system. For example, if the channel is known to result in a high bit error rate (ber), then the effect on the recovered signal X(t) may be mitigated by altering the modulation and regeneration strategies, increasing the length of the check bits in the channel coding words, altering the source encoding algorithm, or combinations of all of these. The conventional arrangement of source and channel codecs may be altered, even merged. Postprocessing of X(t) can also be successfully employed.
Thus, we are interested in the source codec, its algorithms, how they relate to the signals it encodes, how the bit rate can be reduced by exploiting the source signal statistics and properties of human perception, the variety of quality criteria, the codec complexity, and above all, how these phenomena are interrelated and can be traded to approach an optimum design.
We therefore present a discussion of picture sources and our scant knowledge of the salient properties of human perception. Armed with this we describe the current state of the art in waveform and parameter coding and conclude with directions for the future, guessing at where we believe some ultimate limitations may be found.

1.1.1 Picture Sources

Video processing or transmission systems typically start with a two-dimensional distribution of light intensity. Thus, three-dimensional scenes must first be projected onto a two-dimensional plane by an optical imaging system. Color pictures can usually be represented by three such light intensity distributions in three primary bands of wavelengths. If moving objects are to be accommodated, the light intensity must change with time.
The two-dimensional light intensity distribution is then usually raster scanned to produce a one-dimensional waveform. Facsimile involves single pictures, while in television the scene is repetitively raster scanned (usually with interlace to avoid flicker). Black/white pictures, e.g., printed or handwritten text, line drawings, weather maps, produce a two-level or binary waveform.
Color pictures produce three such waveforms corresponding to the three primaries. These are then usually converted by linear combination into a luminance (monochrome brightness) component and two chrominance (hue and saturation) components. Multiplexing methods for further combining these components into a single composite waveform are well known and widely used; however, the luminance component usually takes up most of the channel capacity.

1.1.2 The Eye and Seeing

The eye is the organ of sight, having at its rear an inner nervous coating known as the retina. Rays of light pass through the cornea, aqueous humor, lens, and vitreous body to form an image on the retina. The central area of the retina, known as the fovea, provides high resolution and good color vision in about 1 degree of solid angle. The images on the retinas are sent along two optic nerves, one for each eye, until they meet at the optic chiasma, where half the fibers of each nerve diverge to opposite sides of the brain. This enables observations in three dimensions.
The eye behaves as a two-dimensional low-pass filter for spatial patterns, with a high-frequency cutoff of about 60 cycles per degree of foveal vision and significant attenuation below about 0.5 cycle. Thus, high spatial frequencies in the image are not seen and need not be transmitted. The eye also acts as a temporal bandpass filter having a high-frequency cutoff between 50 and 70 Hz depending on viewing conditions. Flicker is more disturbing at high luminances and low spatial frequencies.
Noise and distortion are less visible at high-luminance levels than at middle- and low-luminance values, again depending on viewing conditions such as overall scene brightness and ambient room lighting. High- and low-frequency noise is less visible than mid-frequency noise. Distortions are also less visible near luminance transitions, such as occur at boundaries of objects in a scene. This is termed spatial masking, since the transitions mask the distortions.
Temporal masking also occurs. For example, shortly after a television scene change, the viewer is relatively insensitive to distortion and loss of resolution. This is also true of objects in a scene that are moving in an erratic and unpredictable fashion. However, if a viewer is able to track a moving object, then resolution and distortion requirements are the same as for stationary areas of a picture.

1.1.3 Subjective Assessment of Quality

As the variety of encoding algorithms increases so do the types of degradation perceived. If perception were thoroughly understood, the quality of reproduction of a particular video encoding strategy could be ascertained by objective measurements of signal parameters. The current situation is one of ad hoc objective measurements, each trying to relate subjective observations with each new encoding algorithm. Old methods of signal-to-noise ratio (SNR), spectral distance measures, pulse shapes, etc., are frequently inadequate. To postulate a new objective measure, subjective testing must be done. Here tests are made on a small sample of the population, and by statistical methods the effect on the entire population is estimated. Subjective testing is controversial. Should simple grading, bad to excellent in five steps, or multidimensional analysis be used? What form should the test take: word text, carefully assembled sentences, natural dialog, type of picture detail, amount of motion, etc.? However, what is even more in dispute is relating subjective testing results to objective measurements. Our inability to do this is a serious impediment both to communication between research scientists and to source encoding itself. Only when perception is properly understood will we have accurate objective measures. However, the day when we can, with confidence, objectively evaluate a new impairment without recourse to subjective testing seems very remote.

1.1.4 Statistical Redundancy and Subjective Redundancy

If an information source such as a television camera produces statistically redundant data—that is, information that could just as well have been derived from past data—then a saving in transmission bit rate can result if the redundant information is removed prior to transmission. In most cases, this requires, at the transmitter, a capability for storing some of the past source output so that a decision can be made as to what is and what is not redundant in the present source output. Memory of past information is also required at the receiver so that the redundance can be rederived and inserted back into the data stream in order to reproduce the original signal.
For example, in a television picture successive picture points (picture elements, or pels for short) along a line are very much alike, and redundancy reduction can be achieved by sending pel-to-pel differences instead of the pels themselves. The differences are small most of the time and large only occasionally. Thus, an average bit-rate saving can be obtained by using short binary words to represent the more probable, small differences and longer binary words to represent the infrequent, large differences. In successive frames a pel also changes very little on the average.
Statistical redundancy is not the only form of redundancy in a video signal. There is also considerable subjective redundancy; that is, information that is produced by the source, but that is not necessary for subjectively acceptable picture quality at the receiver. For example, it is well known that viewers are less sensitive to degradations near edges; i.e., large brightness transitions, in the picture....

Table of contents