Auditory and Visual Pattern Recognition
eBook - ePub

Auditory and Visual Pattern Recognition

David J. Getty, James H. Howard, Jr., David J. Getty, James H. Howard, Jr.

Share book
  1. 236 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Auditory and Visual Pattern Recognition

David J. Getty, James H. Howard, Jr., David J. Getty, James H. Howard, Jr.

Book details
Book preview
Table of contents
Citations

About This Book

The systematic scientific investigation of human perception began over 130 years ago, yet relatively little is known about how we identify complex patterns. A major reason for this is that historically, most perceptual research focused on the more basic processes involved in the detection and discrimination of simple stimuli. This work progressed in a connectionist fashion, attempting to clarify fundamental mechanisms in depth before addressing the more complex problems of pattern recognition and classification. This extensive and impressive research effort built a firm basis from which to speculate about these issues. What seemed lacking, however, was an overall characterization of the recognition problem – a broad theoretical structure to direct future research in this area. Consequently, our primary objective in this volume, originally published in 1981, was not only to review existing contributions to our understanding of classification and recognition, but to project fruitful areas and directions for future research as well. The book covers four areas: complex visual patterns; complex auditory patterns; multi-dimensional perceptual spaces; theoretical pattern recognition.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Auditory and Visual Pattern Recognition an online PDF/ePUB?
Yes, you can access Auditory and Visual Pattern Recognition by David J. Getty, James H. Howard, Jr., David J. Getty, James H. Howard, Jr. in PDF and/or ePUB format, as well as other popular books in Psychology & History & Theory in Psychology. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Routledge
Year
2017
ISBN
9781315532592
Edition
1

1 Pitch Perception: An Example of Auditory Pattern Recognition

Frederic L. Wightman

Introduction

Human listeners normally classify sounds on a number of subjective dimensions, among which are loudness, timbre, and pitch. The auditory processes that mediate auditory classifications are quite complex and, because the dimensions are purely subjective, research on these processes is difficult. In the case of loudness and timbre, research is aided by the fact that there are simple physical correlates of the perceptual experiences. Thus, although the exact functional relationship may be very complicated, it is clear that a change in stimulus intensity will generally cause a change in loudness, and alterations of spectral content will usually change the timbre of sound. However, in spite of the obvious relationship between the frequency and pitch of a sinusoid, there is no simple physical correlate of pitch. Although pitch is definitely related to the frequency, or more generally, the periodicity of a sound, it has recently been recognized that the relationship is by no means a simple one.
The purpose of this chapter is first to demonstrate that classification of auditory signals in terms of their pitch is an extraordinarily complex process and, second, to argue that the process can conveniently be viewed as a kind of auditory pattern recognition. Finally, with reference to some recent results from my own laboratory, it is suggested that data from listeners with certain hearing impairments may help us to understand in more detail how the auditory system accomplishes this pattern recognition.

The Problem

The fundamental question motivating all research in psychophysics is: "What is the stimulus?'' For the purpose of this chapter, that question reduces to: ' 'What is it about a sound that determines its pitch?" There is an enticing simplicity to the question, but still no satisfactory answer. As early as the time of Pythagoras it was thought that pitch was related simply to the frequency or period of a sound. Pythagoras noted that shorter strings, which vibrate more rapidly when plucked, produced a sound with a higher pitch than did longer strings. For simple sounds such as sinusoids, there is no problem because wave forms with the same period generally do have the same pitch. However, it is relatively easy to produce stimuli that have very different periods yet still seem to have the same pitch. Moreover, the same pitch can be evoked by stimuli of widely varying spectral content (such as those produced by different musical instruments sounding the same note) and by stimuli with very different temporal fine structures (such as those produced by one instrument recorded at different places in a reverberant hall). The main problem, then, which any theory of pitch perception must address, is the invariance of pitch, the fact that many different transformations of the physical stimulus leave its pitch unchanged.
Accounting for invariance of one sort or another is just what most pattern-recognition schemes are designed to do. For this reason, it has proven useful to treat pitch extraction as a kind of auditory pattern recognition. A simple analogy may help to illustrate this point of view. The letter "A" as seen here, has its "A-ness" in common with the same letter printed by hand, in a newspaper, or anywhere else. In every case, it is recognized as the letter "A" in spite of great differences among the various physical representations of the letter (e.g., size, orientation, type style, etc.). Similarly, the musical note "middle c" has the same pitch regardless of the instrument that produces it. In both the visual case and the auditory case, the process of recognition and classification of the stimulus leaves the percept invariant in the face of large intrastimulus variability. Viewed in this way, pitch extraction is clearly a pattern-recognition process.

A Brief History

Subjective attributes of sound are difficult to study because they are not accessible by direct measurement. In contrast, physical features such as intensity and frequency are easily quantified. With modern laboratory gear, we can determine the intensity or frequency of a sound with an accuracy of better than .1%. Pitch however, like loudness and timbre, cannot be measured directly, for it exists only in a listener's head. There is no such thing as a pitch meter. Most listeners can tell us only whether the pitches of two sounds are equal or whether the pitch of one is higher or lower than another. This has led to the use of simple matching procedures to provide at least an indirect measure of pitch.
Matching paradigms require listeners to compare the pitch of the sound in question with that of a reference sound. A pure tone (such as that produced by a sine-wave generator) is a convenient reference because pitch can then be defined in terms of the frequency of the tone. For example, we can say a sound has a pitch of 200 Hz, meaning that its pitch is equal to that of a 200 Hz sinusoid. Matching to a sinusoid is sometimes difficult, however, because many sounds of experimental interest have a rich spectrum, giving them a timbre that is very different from that of a pure tone. In these cases, we use a more complex reference sound, such as a periodic pulse train. Pitch matches between the secondary references and pure tones are usually straightforward. In the discussions that follow, the pitch of a given stimulus is defined as a certain frequency in Hz, implying that the pitch is either directly or indirectly equal to that of a pure tone at the same frequency.
The history of research on pitch perception can be divided into three distinct periods. The first, starting in the early 1840s and lasting almost a century, is characterized both by the completion of the first systematic studies of pitch perception and by the emergence of a simple theory, the familiar "place theory" (von Bekesy, 1960). The second period, from about 1940 to 1970, is distinguished by a number of experiments proving the inadequacy of the simple place theory and by the subsequent development of an alternative, which is sometimes called "fine-structure" theory (Schouten, 1940). By the early 1970s, it had become clear that modern experimental evidence simply could not support the fine-structure theory, and so the third period began with the introduction of three major new theories of pitch perception (Goldstein, 1973: Terhardt, 1974; Wightman, 1973). Each of these theories can be described loosely as a pattern-recognition theory.

Development of Place Theory: 1840-1940

Research during the early years focused on the importance, for pitch, of the lower harmonics of complex tones, specifically the fundamental. This attention to the fundamental was probably motivated by the fact that when we listen to musical tones, which are complex and periodic (or nearly so), and thus contain a number of harmonics, we actually hear just one tone, the pitch of which corresponds to the fundamental. Seebeck (1841) was one of the first to study this phenomenon systematically. He produced periodic stimuli with an acoustic siren, which consisted of a circular disk with holes punched around the perimeter. As the disk was rotated, compressed air was directed at the holes, and as each hole passed by the air source, an acoustic impulse was produced. With the holes regularly spaced around the disk, a stimulus consisting of a periodic sequence of impulses was generated. This stimulus has a spectrum consisting of lines at every integer multiple of the fundamental frequency. The fundamental is of course the reciprocal of the time between impulses. Seebeck noted that the stimulus produced a very strong pitch that corresponded to the fundamental frequency. Moreover, when the number of holes around the disk was doubled, thus halving the time between impulses and doubling the fundamental frequency, the pitch rose an octave. Based on these simple experiments, Seebeck concluded that pitch was determined either by the periodicity of the sound wave or by its fundamental frequency. A later experiment led Seebeck to favor the periodicity position and spawned a celebrated dispute between Seebeck and his contemporary, G. S. Ohm. In this controversial experiment, Seebeck generated stimuli with a disk in which the holes were not equidistantly spaced, but rather one in which the time between air puffs would be alternately t1, t2, t1, t2, etc., with tl slightly different from t2. The fundamental frequency of the resulting stimulus was of course the reciprocal of its period, t1 + t2. The pitch of the stimulus corresponded exactly to the fundamental frequency. Inasmuch as the spectrum contained very little energy at the fundamental frequency, Seebeck argued that the periodicity of the stimulus, rather than the presence of the fundamental, was the primary determinant of pitch. G. S. Ohm (1843), who firmly believed that a pitch could be heard only if the stimulus contained energy at the corresponding frequency (Ohm's "acoustical law"), suggested that Seebeck's conclusion was inapprorpriate because his stimulus actually did contain some energy at the fundamental. Seebeck replied that the pitch was much stronger than might be expected because the amount of energy at the fundamental frequency was so small. Ohm finally suggested that Seebeck was misled by what he called an acoustical "illusion."
Twenty years later, Helmholtz (1863) offered a possible resolution of the controversy. In his monumental book, On the Sensation of Tone as a Physiological Basis for the Theory of Music, Helmholtz strongly supported Ohm's position. He provided both a possible physiological basis for the spectral analysis of sound that Ohm's "law" required and a possible physical explanation of Seebeck's "illusion." First, Helmholtz suggested that the basilar membrane inside the cochlea contained transversely stretched fibers, much like the strings of a harp, with each fiber resonant to a different frequency. Thus, the cochlea would function as a crude spectral analyzer. Second, Helmholtz argued that the transduction of sound from the eardrum to the cochlea was a nonlinear process and that incoming sound waves would be distorted. Distortions such as Helmholtz proposed would introduce spectral components that were not present in the original sound, but which would be analyzed by the cochlea in the same way. With the complex stimuli produced by Seebeck's sirens, the additional components would appear at frequencies given by the frequency difference between the components in the original stimulus. In all the cases studied by Seebeck, most importantly the third controversial case, this frequency is the fundamental. In other words, nonlinear distortion would be expected to add a component to the stimulus, before it was analyzed by the cochlea, at the frequency corresponding to the perceived pitch. Thus, Helmholtz's distortion hypothesis could explain Seeback's "illusion," within the framework of Ohm's "law."
Helmholtz's hypotheses went unchallenged during the next 75 years, But, in 1924, Seebeck's experiments were replicated and extended by Harvey Fletcher using electronic equipment to generate and control the complex stimuli (Fletcher, 1924). Fletcher's results completely corroborated Seebeck's. In fact, Fletcher found that even if the fundamental and several additional lower harmonics were completely removed from the acoustic stimulus, the pitch still corresponded exactly to the fundamental. Fletcher relied on Helmholtz's distortion hypothesis to explain this phenomenon, which was later called "the problem of the missing fundamental."
In the late 1920s, Georg von Bekesy provided firm evidence for the existence of the cochlear spectrum analyzer; required by Helmholtz's theory (von Bekesy, 1928, in von Bekesy, 1960). In a series of experiments in which he actually observed the movement of the basilar membrane, von Bekesy showed that the place of maximum vibration of the membrane changed in an orderly way as the frequency of the sound was varied. Low-frequency sound caused a maximum at one end of the membrane, and high-frequency sound produced a maximum at the other end. Although the details of von Bekesy's discovery differed from those originally proposed by Helmholtz, it was clear that the cochlea did indeed perform the spectral analysis required by Helmholtz and Ohm's theory of pitch perception. As this theory required that a corresponding "place" on the basilar membrane be stimulated in order for a pitch to be heard, the theory came to be known as "place theory." It was not until the late 1930s that the inadequacies of the theory were recognized.

Development of Fine-Structure Theories: 1940–1970

The place theory held that a necessary condition for the perception of pitch is stimulation in the cochlea by the corresponding spectral component. In the case of complex tones in which the fundamental was absent from the acoustic stimulus, the corresponding spectral component (the fundamental) was thought to be reintroduced by nonlinear processes in the middle or inner ear. Once reintroduced, the distortion product would be expected to behave just like a simple tone at that frequency.
The demise of place theory was brought about by two simple demonstrations. The first was that the required distortion product was either absent or very weak, and the second was that even if the distortion product was present it could not be the primary mediator of t...

Table of contents