Modularity and the Motor theory of Speech Perception
eBook - ePub

Modularity and the Motor theory of Speech Perception

Proceedings of A Conference To Honor Alvin M. Liberman

Michael Studdert-Kennedy, Ignatius G. Mattingly, Michael Studdert-Kennedy, Ignatius G. Mattingly

Share book
  1. 480 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Modularity and the Motor theory of Speech Perception

Proceedings of A Conference To Honor Alvin M. Liberman

Michael Studdert-Kennedy, Ignatius G. Mattingly, Michael Studdert-Kennedy, Ignatius G. Mattingly

Book details
Book preview
Table of contents
Citations

About This Book

A compilation of the proceedings of a conference held to honor Alvin M. Liberman for his outstanding contributions to research in speech perception, this volume deals with two closely related and controversial proposals for which Liberman and his colleagues at Haskins Laboratories have argued forcefully over the past 35 years. The first is that articulatory gestures are the units not only of speech production but also of speech perception; the second is that speech production and perception are not cognitive processes, but rather functions of a special mechanism. This book explores the implications of these proposals not only for speech production and speech perception, but for the neurophysiology of language, language acquisition, higher-level linguistic processing, the visual perception of phonetic gestures, the production and perception of sign language, the reading process, and learning to read. The contributors to this volume include linguists, psycholinguists, speech scientists, neurophysiologists, and ethologists. Liberman himself responds in the final chapter.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Modularity and the Motor theory of Speech Perception an online PDF/ePUB?
Yes, you can access Modularity and the Motor theory of Speech Perception by Michael Studdert-Kennedy, Ignatius G. Mattingly, Michael Studdert-Kennedy, Ignatius G. Mattingly in PDF and/or ePUB format, as well as other popular books in Personal Development & Writing & Presentation Skills. We have over one million books available in our catalogue for you to explore.

Information

Year
2014
ISBN
9781317785057

Chapter 1
Introduction: Speech Perception

Franklin S. Cooper
Haskins Laboratories
Welcome to the Conference on Modularity and the Motor Theory of Speech Perception. It is a real pleasure to see so many old friends and to greet those of you whom I have known only by reputation—a pleasure, too, to welcome you graduate students on whom the future of speech research depends. If you are wondering about the viability of a field of research that is already honoring one of its pioneers, the papers you are about to hear will make it clear, I think, that there are more problems ahead of you than there are solutions behind us greybeards. For example: Modularity and the Motor Theory. So welcome to the intellectual challenges as well as to this conference!
To Al Liberman, who is himself an old hand at conferences, this one must be something of a novelty: It was arranged for him, not by him! It is entirely appropriate that Haskins Laboratories should wish to honor him. Al has been a coworker and a cobeliever in Haskins Laboratories ever since he joined it in 1944 and a continuing inspiration to all of us, both personally and intellectually. He still wanders the halls asking, "What have you discovered today?" It is doubly appropriate that he be honored by a conference on Modularity and the Motor Theory of Speech Perception, since these ideas have been central to his own work and to the many contributions he has made to speech research. I could say more—much more—in the same vein but will limit it to one personal comment: To me, A1 has been a friend, and I am the one honored.
Let me consider with you some simple-minded questions. How does it happen that we are here to talk about the Motor Theory of Speech Perception? (I shall leave Modularity aside for a moment.) Part of the answer lies in the history of the field, and as we probe that history—for the benefit of you younger people—we shall find even prior questions. Thus, talking about a theory implies some kind of problem for that theory to explain. Was there such a problem? This may seem a strange thing to ask, since the question of how speech is perceived has been a thorny problem for as long as most of you can remember. Nearly as ancient is the Motor Theory as a proposed solution.
But there was a time when even the problem did not exist—or was not known to be a problem. In the same sense, gravity was not a problem before Newton's time: Everybody knew that apples fell down just as everything else did. So likewise the perception of speech posed no special problem; it and other sounds were heard and recognized all in the same general way.
Let me press the parallel a little farther: Neither Isaac Newton nor Alvin Liberman discovered his problem until it fell on him. Newton can now be dismissed, though we should note that it is not every man of science who provides his own problem as well as its solution.
Back to AI and how he discovered his problem: Namely, how is speech perceived? He did not begin with speech. The problem that he and I were working on at the end of World War II was the practical one of designing a reading machine for blinded veterans. Our approach was simple and direct: The machine would scan a line of type and convert the distinctive letter shapes into distinctive sound shapes which the blind reader would, with practice, come to recognize— and so to read printed books by ear.
The difficulty that we encountered—as did others before and after us—was that the reading rates were so painfully slow, even after hours and hours of practice, that no one would use the device. We tried many things to make the sounds more distinctive and more easily learned, but reading rates were no better and often worse. Most frustrating was that the performance of our subjects when identifying our machine-made words was much poorer than their performance when identifying nonsense words, spoken by a person.
Thus did Al's problem come down upon him: Finally, he realized that the right question was not why machine-made sounds are so poor but rather why manmade sounds are so good. What is so special about speech that makes its perception so easy?
He then supposed that speech was just a better acoustic alphabet—that it took the phonetic string of a sentence and spelled it out with unit sounds that could be heard easily and rapidly, because they flowed together into words. By studying these unit sounds of speech, he might be able to design a better set of sounds for the reading machine.
But by this time, the Potter, Kopp, and Green (1947) collection of spectrograms had been published, and one could see that finding acoustic invariants for the phonemes would not be so easy. One could pick out some of the acoustic consequences of articulation, but where in all this complex pattern were the acoustic cues for perceiving the individual speech sounds known to be lurking there?
This search for the acoustic cues was the task that Al, Pierre Delattre, and I undertook in the early 1950s using spectrograph and pattern playback. What we found was well known at the time and is still available in the literature. Cues there were—in abundance and extreme diversity. Before the end of the decade, most of them had been found and organized into rales for synthesis that generated quite intelligible speech (Libennan, Ingemann, Lisker, Delattre, & Cooper, 1959).
But it was the diversity and curious character of the cues that needed a better explanation than current auditory theories of perception could provide. The cues for a particular speech sound seemed to make sense only when one considered how that sound had been articulated. Al made these arguments explicit in his 1957 (Liberman, 1957) review paper and offered a motor theory to explain why speech is so exceptionally efficient as a carrier of messages.
Thus history, not logic, is the principal reason we are here to talk about a motor theory of speech perception rather than a motor theory of speech production.
There were other reasons, too. There was then a bias—which still persists— toward thinking about speech as "that which goes into the ear" rather than "that which comes out of the mouth." Little wonder, since the ear and its roots in the brain are so much more elegant and mysterious than the mouth's crossed-up plumbing and ventilating systems, which can't even breathe and swallow at the same time! Then, too, instrumentation was largely lacking for research on production.
Let me add as an aside that although Al continued to focus on the perception of speech and its many unique characteristics, there were some of us here who did start, in the late 1950s, to look for phonological structure on the production side. The Laboratories still has a major program ongoing in this area, and we are by no means alone.
Now, what would be different if we were talking about a motor theory of speech production instead of a motor theory of speech perception? Surely there must be close linkages between the two processes and their mechanisms unless, indeed, a single mechanism performs both functions. But whatever the internal structure of the speech module (or modules), the input and output signals are very different in kind and structure. This calls for a restructuring operation somewhere in the sequence—one that may put tighter constraints on a model for the speech module than do either perception or production.
So another question: should we perhaps be talking about a motor theory of speech per se, where "speech" stands for "communication by voice?" This would emphasize the communicative function that is served by both perception and production. Moreover, it would give central place to the operation that ensures error-free regeneration of spoken messages, even when repeated many times.
You may object to so much emphasis on the relaying of spoken messages from person to person, since it is so rarely done. The point is that it can be done; the mechanism is in place and in use for other purposes. Long ago, this kind of relaying was common; indeed, speech—aided by rhyme—served to repeat epic poems intact across the ages. The trick, just as with long-distance telephony now that it has gone digital, is to regenerate the signal each time it is relayed. The incoming signal, contaminated with noise and distortion, is replaced by a shiny new signal in canonical form. For humans, the regenerated signals serve a further purpose: They are just what is needed for memory, since the bit rate for identifying the message units is so much less than for describing the incoming sounds.
Regeneration is only one of several names for the function I have been talking about. Categorization is an essential part of the function, and with labeling included it provides the recognition stage in models of speech perception. Restructuring, or recoding, are also closely related terms. In models of speech production, the generative part of regeneration corresponds to setting up motor plans or coordinative structures. I have used the term "regeneration," because it relates to both input and output and implies the communicative function of which it is an essential part.
Clearly, regeneration also implies units. In their canonical form, these would be the "intended gestures" of the motor theory. But surely these are only a subset of all possible gestures, so what constrains the choice? Speed of execution is one requirement. In fact, people can and do talk at rates of up to fifteen or so units per second—which seems impossibly high for such slow machinery as tongue and jaw. So we should not expect speech gestures to conform to our usual notion of a completed movement such as a nod of the head or a wave of the hand. No amount of coarticulation between such gestures (i.e., overlap along the time line), would crowd them into the time allowed.
But coarticulation across the time line could do it. Given the several articulators that we have and their potential for independent and concurrent action, the total system could achieve a succession of discrete states—nameable as phonemes or intended gestures—and so attain a kind of phase velocity much higher than that of the individual articulators. It may be comforting to note that this way of looking at speech—searching for coincidences and alignments during ongoing gesturing—conforms to the cosmic strategy whereby astrologers seek our destinies in planetary alignments.
Another constraint on the choice of gestures is the fairly obvious one that they must have acoustic consequences. Preferably, the consequences would be as strong and distinctive as they are for [s] and [∫], but given the nature of the gestures, most of the sounds are necessarily variable with context and some, to round out the inventory, are even as feeble and confusable as [f] and [Θ].
A more demanding requirement is that the units be permutable. Thus, assuming speech to be a succession of discrete states that progresses from one intended gesture to the next, then the set of possible "next gestures" from any particular state is small and sharply constrained. It is limited—not by phonological rules— but by circumstances such as that some of the articulators are already in midmovement and must, therefore, continue moving in the next gesture. In a more general way, one of the prices of parallelism is that there is no way to extract a time slice without leaving rough edges, so shuffling its position means finding a place where the edges will match.
It might be useful to turn this argument on its head and use the permutability requirement to reinterpret our knowledge of how real phonemic units combine and recombine. That could help us to arrive at physiological descriptions of the "intended gestures."
Much or what I have been saying has dealt with the constraints that particular processes put on models for speech. Let me now try a different tack and ask about minimal constraints on the speech signal at various stages of the communicative process: Thus, what requirements at the very least must the unit signals of speech meet, if they are to be useful in perception, in production, and in such intermediate processing as may be needed to link perception and production? And, having asked these questions, let me propose answers: For perception, the signals must at the very least be audible; for production, they must be utterable; and for the intermediate processing, they must be both regenerable and permutable; it would help, if they were also memorable. The moral I would draw is obvious: The constraints that really bind are the need to regenerate and the need to permute the signal units.
finally, let me return to my original question, slightly sharpened: We are, in fact, met here to talk about Modularity and the Motor Theory of Speech Perception. Does that emphasis on perception mean that we are "barking up the wrong tree?" Like most simple-minded questions, this one has two answers: YES, if we suppose that perception is all-important, or that it can be dealt with in isolation. NO, if we consider that perception by itself is a very large topic for a single conference, and if we remember that the models we build for perception must be compatible with the rest of the communicative process; that is, they must honor the Throughput Principle: That which goes in at the ear, and out from the mouth, must somehow go through the head.

References

Liberman, A. M. (1957). Some results of research on speech perception. Journal of the Acoustical Society of America, 29, 117-123.
Liberman, A. M., Ingemann, F., Lisker, L,, Delattre, P. C., & Cooper, F. S. (1959). Minimal rules for synthesizing speech. Journal of the Acoustical Society of America, 31, 1490-1499.
Potter, R. K., Kopp, G. A., & Green, H. (1947). Visible speech. New York: Von Nostrand.

Chapter 2
The Status of Phonetic Gestures

Björn Lindblom
Department of Linguistics, University of Texas, and University of Stockholm
Abstract
In this chapter, I shall argue that speakers adaptively tune phonetic gestures to the various needs of speaking situations (the plasticity of phonetic gestures) and that languages make their selection of phonetic gesture inventories under the strong influence of motor and perceptual constraints that are language independent and in no way special to speech (the functional adaptation of phonetic gestures). These points have implications for a number of issues on which the Motor Theory takes a stance. In particular, the evidence reviewed challenges two assumptions that are central to the Motor Theory—that of modularity and gestural invariance. First, if phonetic gestures possess invariance at the level of motor commands, and listeners are able to perceive such gestural invariance, why is speech production so often nevertheless under output-oriented control? Second, the Motor Theory assumes that speech perception is a biologically specialized process that bypasses the auditory mechanisms responsible for the processing of nonspeech sounds. It also assumes that the motor system for vocal tract control exhibits specialized adaptations. If so, why do inventories of vowels and consonants nevertheless show evidence of being optimized with respect to motoric and perceptual limitations that must be regarded as biologically general and not at all special to speaking and listening?
There are two aspects of phonetic gestures that merit special attention in the context of the Motor Theory (MT), (Liberman & Mattingly, 1985). One striking fact comes from observations of how speech is produced: A large body of experimental evidence suggests that phonetic gestures are highly malleable and adaptive. They exhibit plasticity.
The second point emerges from cross-linguistic data on how languages select gestures to build segment inven...

Table of contents