Speech Production
eBook - ePub

Speech Production

Models, Phonetic Processes, and Techniques

  1. 404 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Speech Production

Models, Phonetic Processes, and Techniques

About this book

Speech Production: Models, Phonetic Processes and Techniques brings together researchers from many different disciplines - computer science, dentistry, engineering, linguistics, phonetics, physiology, psychology - all with a special interest in how speech is produced. From the initial neural program to the end acoustic signal, it provides an overview of several dominant models in the speech production literature, as well as up-to-date accounts of persistent theoretical issues in the area. A particular focus is on the evaluation of information gleaned from instrumental investigations of the speech production process, including MRI, PET, ultra-sound, video-imaging, EMA, EPG, X-ray, computer simulation - and many others. The research presented in this volume considers questions such as: the feed-back vs. feed-forward control of speech; the acoustic/auditory vs. articulatory/somato-sensory domains of speech planning; the innateness of human speech; the possible architecture of a speech production model; and the realization of prosodic structure in speech. Leaders in speech research from around the world have contributed their most recent work to this volume.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Speech Production by Jonathan Harrington,Marija Tabain in PDF and/or ePUB format, as well as other popular books in Psychology & Cognitive Psychology & Cognition. We have over one million books available in our catalogue for you to explore.

Information

PART 1
Models
1
About Speech Motor Control Complexity
PASCAL PERRIER

ABSTRACT

A key issue in research about speech motor control is the one of the level of complexity that is required for the internal models: have these models to account accurately for all physical properties of the speech motor system, including the complex tongue-jaw biomechanics? Or would more simple internal representations be sufficient, which model only the static characteristics of the peripheral speech apparatus or which give a rough account of articulatory dynamics? On the basis of experimental and modeling studies of speech movements and human limb movements published in the literature, the adequacy of simplified internal representations for speech motor control is analyzed.

INTRODUCTION

In the past two decades, the analysis of speech motor control and its modeling were basically inspired by investigations and models of other skilled human movements, such as reaching, pointing, or grasping. This research approach has been proved to be efficient and it allowed the emergence of a number of speech control models which served as theoretical backgrounds for numerous studies (see among others Guenther, 1995; Laboissière, Ostry, & Feldman, 1996; Ostry, Gribble, & Gracco, 1996; Perkell et al., 1997, 2000; Perrier, Lcevenbruck, & Payan, 1996; Saltzman, 1986; Saltzman & Munhall, 1989). Consequendy, the issues raised today in the domain of motor control research are very important for future studies of speech motor control, and addressing them is a prerequisite for the improvement of the current models. Among these issues, two questions are of particular interest: (1) How complex are the acquired internal representations of the motor system, built up during the acquisition of the motor task, and used to plan and achieve the movement? and (2) What role does low-level, short-latency feedback play in achieving an accurate and stable movement control? In this paper, contributions to these issues are proposed, while taking into account some important peculiarities of the speech production task.

SPEECH PRODUCTION: A COMPLEX MOTOR TASK

Compared to many other human motor tasks classically studied in motor control research, speech production has a number of peculiarities that make it particularly complex. Some findings suggesting such a complexity were already frequently discussed in the literature:
1. Because of its semiotic nature, the goal of speech production is actually defined in an abstract domain. Hence, its physical characterization is not straightforward, and this has two consequences. First there is no unique physical correlate for a given elementary speech sound and a large variability of patterns can be observed in the neurophysiological, articulatory, and acoustic domains (see Perkell & Klatt, 1986). Second the issue of the physical space in which the motor task is planned becomes particularly complex, since the distal space can be defined either by articulatory positions, or by spectral properties of the speech signal, or by perceptual characteristics of this signal (see Browman & Goldstein, 1990; Guenther, Hampson, & Johnson, 1998; Savariaux, Perrier, & Orliaguet, 1995; Stevens, 1989; Tremblay, Shiller, & Ostry, 2003), or a multimodal space associating orosensory, auditory, and even visual characterizations.
2. Speech production has a large number of degrees of freedom that confer a many-to-one characteristic on the relationships between motor commands, articulatory positions, and acoustic or auditory properties. This characteristic, together with the above-mentioned intrinsic variability of the physical correlates of the production of a given sound, has the consequence that a large set of motor equivalence strategies can be used to implement a range of coarticulation strategies or to deal with artificial perturbations (such as a pipe or food in the mouth), or pathological perturbations (such as tongue or mandible surgery) or peripheral perturbations (Guenther, Espy-Wilson, Boyce, Matthies, Zandipour, & Perkell, 1999; McFarland, Baum, & Chabot, 1996; Perkell, Matthies, Svirsky, & Jordan, 1993; Savariaux, Perrier, Orliaguet, & Schwartz, 1999). These multiple strategies obviously contribute to the complexity of speech motor control.
3. Compared to other skilled human motor activities, speech movements in normal conditions can be very short, since vowels have a mean duration of approximately 80 ms and consonants have mean durations around 40 ms (O'Shaughnessy 1981). These characteristics seem to exclude any potential online contribution of long-latency orosensory feedback that would be processed by the cortex, and to limit the role of auditory feedback to a suprasegmental level and to an a posteriori monitoring used to correct segmental aspects of speech after it was produced (Perkell et al., 1997). The absence of online use of auditory feedback to control speech at a segmental level is well supported by experimental work showing that speakers can produce intelligible speech even after hearing loss (Lane & Wozniak, 1991; Manzella, Wozniak, Matthies, Lane, Guiod, & Perkell, 1994). Compatibly, work on stutterers and normal speakers shows that delaying auditory feedback in the range of 50–200 ms affects prosodic (speaking rate, fluency, rhythm, intonation, and stress) rather than segmental features (Hargrave, Kalinowski, Stuart, Armson, & Jones, 1994; Stager & Ludlow, 1993). At the same time, speech gestures have to be accurate enough in order to ensure that the associated acoustic signal can be correctly perceived by a listener. How accuracy can be obtained without the use of long-latency feedback that would be processed by the cortex, is a key issue for speech production research, but not for other human motor tasks except for eye saccades.
To deal with the complexity and the multimodality of speech task representations, with the numerous motor or auditory equivalence strategies and with the accuracy requirements in the absence of long-latency feedback going through the cortex, the large majority of speech motor control models published in the literature assume the existence of internal representations of the speech apparatus, called internal models (Guenther, 1995; Hirayama, Vatikiotis-Bateson, Kawato, & Jordan, 1992; Jordan, 1990; Jordan & Rumelhart, 1992; Kawato, Maeda, Uno, & Suzuki, 1990; Laboissière, Schwartz, & Bailly, 1991; Perkell et al., 1997, 2000; Perrier, Payan, & Marret, 2004).

INTERNAL MODELS AND SPEECH PRODUCTION CONTROL

A Useful Concept to Deal with the Control of Complex Motor Tasks

The internal model concept was proposed in the research domain to deal with the nonbiunivocity between motor commands and position of the final effector, and with the delays associated with the processing of long-latency feedback. The basic hypothesis is that copies of the motor system, or of subsets of it, could be learned in the brain during the acquisition of the motor skill (in our case in the speech learning phase). Once they are learned, these models could be used to estimate predictions of the consequences of motor command changes on the trajectory of the final effector. According to different existing models of human motor control published in the literature, the role of internal models in the execution of the motor task could take different forms.
A first hypothesis suggests that internal models' predictions could be the basis of task-planning strategies aiming at ensuring that the final effector moves along a specific desired trajectory in the task space. This is the so-called desired trajectory hypothesis (Kawato, 1999). Since human motor systems have usually an excess of degrees of freedom, many different motor command sequences are likely to allow the achievement of the specified trajectory. The basic idea of the desired trajectory hypothesis is that the central nervous system would use internal models prior to the execution of the movement, in order to select an optimal one from all possible motor command sequences that would both generate the required trajectory and minimize a motor criterion, classically related to the concept of effort. This kind of internal model is classically called inverse internal model, since it permits to go from the desired output to the motor commands.
For target-oriented movements, an alternative hypothesis suggests that the central nervous system would use internal models as direct models during the planning to optimize the positioning accuracy of the final effector at the target in presence of neural noise (Harris & Wolpert, 1998). From this perspective, internal models would also be used prior to the execution of movement, but the trajectory of the final effector toward the target would be a consequence of the planning strategy rather than the specification of the task itself. Another, more recent use of direct internal models was proposed by Todorov and Jordan (2002) in the framework of their optimal control feedback model. According to this model, the motor control strategy would not consist in specifying a desired trajectory in the task space and in selecting the appropriate commands for the motor system to follow this trajectory. It would instead make use of feedback information during the execution of movement to selectively modify motor commands in an optimal way, when deviations in the task space occur that would endanger the achievement of the task goal. The proposed use of feedback would require long-delay loops that are known to generate instabilities in closed-loops servo-mechanisms and that would consequently induce inadequate corrections from the controller. To overcome these problems, Torodov and Jordan (2002) have proposed using the outputs of the internal models as afferent signals.
Thus, from the perspective of motor control models such as the desired trajectory hypothesis, or Harris and Wolpert's (1996) proposal, or the optimal control feedback model, internal models are very powerful tools for dealing both with the many-to-one nature of the relations between motor commands and vocal tract configurations or spectral characteristics of the acoustic signal, as well as with the long latency of feedback processed by the cortex, and with the possible multimodality of the speech task representations. A...

Table of contents

  1. Cover
  2. Half Title
  3. Full Title
  4. Copyright
  5. Contents
  6. About the Editors
  7. Contributors
  8. Introduction
  9. PART 1 : MODELS
  10. PART 2: PHONETICS AND CROSS-LINGUISTIC ANALYSES
  11. PART 3: TECHNIQUES
  12. Author Index
  13. Subject Index