eBook - ePub

The Routledge Handbook of Phonetics

Name: The Routledge Handbook of Phonetics
ISBN: 9780429509186

648 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

The Routledge Handbook of Phonetics

About this book

The Routledge Handbook of Phonetics provides a comprehensive and up-to-date compilation of research, history and techniques in phonetics. With contributions from 41 prominent authors from North America, Europe, Australia and Japan, and including over 130 figures to illustrate key points, this handbook covers all the most important areas in the field, including:

• the history and scope of techniques used, including speech synthesis, vocal tract imaging techniques, and obtaining information on under-researched languages from language archives;

• the physiological bases of speech and hearing, including auditory, articulatory, and neural explanations of hearing, speech, and language processes;

• theories and models of speech perception and production related to the processing of consonants, vowels, prosody, tone, and intonation;

• linguistic phonetics, with discussions of the phonetics-phonology interface, sound change, second language acquisition, sociophonetics, and second language teaching research;

• applications and extensions, including phonetics and gender, clinical phonetics, and forensic phonetics.

The Routledge Handbook of Phonetics will be indispensable reading for students and practitioners in the fields of speech, language, linguistics and hearing sciences.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.

Yes, you can access The Routledge Handbook of Phonetics by William F. Katz, Peter F. Assmann, William F. Katz,Peter F. Assmann in PDF and/or ePUB format, as well as other popular books in Languages & Linguistics & Linguistics. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Year

Print ISBN

eBook ISBN

Topic

Languages & Linguistics

Subtopic

Linguistics

Index

Languages & Linguistics

Part I

History, scope, and techniques

1
History of speech synthesis

Brad H. Story

Introduction

For the past two centuries or more, a variety of devices capable of generating artificial or synthetic speech have been developed and used to investigate phonetic phenomena. The aim of this chapter is to provide a brief history of synthetic speech systems, including mechanical, electrical, and digital types. The primary goal, however, is not to reiterate the details of constructing specific synthesizers but rather to focus on the motivations for developing various synthesis paradigms and illustrate how they have facilitated research in phonetics.

The mechanical and electro-mechanical era

On the morning of December 20, 1845, a prominent American scientist attended a private exhibition of what he would later refer to as a “wonderful invention.” The scientist was Joseph Henry, an expert on electromagnetic induction and the first Secretary of the Smithsonian Institution. The “wonderful invention” was a machine that could talk, meticulously crafted by a disheveled 60-year-old tinkerer from Freiburg, Germany named Joseph Faber. Their unlikely meeting in Philadelphia, Pennsylvania, arranged by an acquaintance of Henry from the American Philosophical Society, might have occurred more than a year earlier had Faber not destroyed a previous version of his talking machine in a bout of depression and intoxication. Although he had spent some 20 years perfecting the first device, Faber was able to reconstruct a second version of equal quality in a year’s time (Patterson, 1845).

The layout of the talking machine, described in a letter from Henry to his colleague H.M. Alexander, was like that of a small chamber organ whose keyboard was connected via strings and levers to mechanical constructions of the speech organs. A carved wooden face was fitted with a hinged jaw, and behind it was an ivory tongue that was moveable enough to modulate the shape of the cavity in which it was housed. A foot-operated bellows supplied air to a rubber glottis whose vibration provided the raw sound that could be shaped into speech by pressing various sequences or combinations of 16 keys available on a keyboard. Each key was marked with a symbol representing an “elementary” sound that, through its linkage to the artificial organs, imposed time-varying changes to the air cavity appropriate for generating apparently convincing renditions of connected speech. Several years earlier Henry had been shown a talking machine built by the English scientist Charles Wheatstone, but he noted that Faber’s machine was far superior because instead of uttering just a few words, it was “capable of speaking whole sentences composed of any words what ever” (Rothenberg et al., 1992, p. 362).

In the same letter, Henry mused about the possibility of placing two or more of Faber’s talking machines at various locations and connecting them via telegraph lines. He thought that with “little contrivance” a spoken message could be coded as keystrokes in one location which, through electromagnetic means, would set into action another of the machines to “speak” the message to an audience at a distant location. Another 30 years would pass before Alexander Graham Bell demonstrated his invention of the telephone, yet Henry had already conceived of the notion while witnessing Faber’s machine talk. Further, unlike Bell’s telephone, which transmitted an electrical analog of the speech pressure wave, Henry’s description alluded to representing speech in compressed form based on slowly varying movements of the operator’s hands, fingers, and feet as they formed the keystroke sequences required to produce an utterance, a signal processing technique that would not be implemented into telephone transmission systems for nearly another century.

It is remarkable that, at this moment in history, a talking machine had been constructed that was capable of transforming a type of phonetic representation into a simulation of speech production, resulting in an acoustic output heard clearly as intelligible speech – and this same talking machine had inspired the idea of electrical transmission of low-bandwidth speech. The moment is also ironic, however, considering that no one seized either as an opportunity for scientific or technological advancement. Henry understandably continued on with his own scientific pursuits, leaving his idea to one short paragraph in an obscure letter to a colleague. In need of funds, Faber signed on with the entertainment entrepreneur P.T. Barnum in 1846 to exhibit his talking machine for a several months run at the Egyptian Hall in London. In his autobiography, Barnum (1886) noted that a repeat visitor to the exhibition was the Duke of Wellington, who Faber eventually taught to “speak” both English and German phrases with the machine (Barnum, 1886, p. 134). In the exhibitor’s autograph book, the Duke wrote that Faber’s “Automaton Speaker” was an “extraordinary production of mechanical genius.” Other observers also noted the ingenuity in the design of the talking machine (e.g., “The Speaking Automaton,” 1846; Athenaeum, 1846), but to Barnum’s puzzlement it was not successful in drawing public interest or revenue. Faber and his machine were eventually relegated to a traveling exhibit that toured the villages and towns of the English countryside; it was supposedly here that Faber ended his life by suicide, although there is no definitive account of the circumstances of his death (Altick, 1978). In any case, Faber disappeared from the public record, although his talking machine continued to make sideshow-like appearances in Europe and North America over the next 30 years; it seems a relative (perhaps a niece or nephew) may have inherited the machine and performed with it to generate income (“Talking Machine,” 1880; Altick, 1978).

Although the talking machine caught the serious attention of those who understood the significance of such a device, the overall muted interest may have been related to Faber’s lack of showmanship, the German accent that was present in the machine’s speech regardless of the language spoken, and perhaps the fact that Faber never published any written account of how the machine was designed or built – or maybe a mechanical talking machine, however ingenious its construction, was, by 1846, simply considered passé. Decades earlier, others had already developed talking machines that had impressed both scientists and the public. Most notable were Christian Gottlieb Kratzenstein and Wolfgang von Kempelen, both of whom had independently developed mechanical speaking devices in the late 18th century.

Inspired by a competition sponsored by the Imperial Academy of Sciences at St. Petersburg in 1780, Kratzenstein submitted a report that detailed the design of five organ pipe-like resonators that, when excited with the vibration of a reed, produced the vowels /a, e, i, o, u/ (Kratzenstein, 1781). Although their shape bore little resemblance to human vocal tract configurations, and they could produce only sustained sounds, the construction of these resonators won the prize and marked a shift toward scientific investigation of human sound production. Kratzenstein, who at the time was a Professor of Physics at the University of Copenhagen, had shared a long-term interest in studying the physical nature of speaking with a former colleague at St. Petersburg, Leonhard Euler, who likely proposed the competition. Well known for his contributions to mathematics, physics, and engineering, Euler wrote in 1761 that “all the skill of man has not hitherto been capable of producing a piece of mechanism that could imitate [speech]” (p. 78) and further noted that “The construction of a machine capable of expressing sounds, with all the articulations, would no doubt be a very important discovery” (Euler, 1761, p. 79). He envisioned such a device to be used in assistance of those “whose voice is either too weak or disagreeable” (Euler, 1761, p. 79).

During the same time period, von Kempelen – a Hungarian engineer, industrialist, and government official – used his spare time and mechanical skills to build a talking machine far more advanced than the five vowel resonators demonstrated by Kratzenstein. The final version of his machine was to some degree a mechanical simulation of human speech production. It included a bellows as a “respiratory” source of air pressure and air flow, a wooden “wind” box that emulated the trachea, a reed system to generate the voice source, and a rubber funnel that served as the vocal tract. There was an additional chamber used for nasal sounds, and other control levers that were needed for particular consonants. Although it was housed in a large box, the machine itself was small enough that it could have been easily held in the hands. Speech was produced by depressing the bellows, which caused the “voice” reed to vibrate. The operator then manipulated the rubber vocal tract into time-varying configurations that, along with controlling other ports and levers, produced speech at the word level, but could not generate full sentences due to the limitations of air supply and perhaps the complexity of controlling the various parts of the machine with only two hands. The sound quality was child-like, presumably due to the high fundamental frequency of the reed and the relatively short rubber funnel serving as the vocal tract. In an historical analysis of von Kempelen’s talking machine, Dudley and Tarnoczy (1950) note that this quality was probably deliberate because a child’s voice was less likely to be criticized when demonstrating the function of the machine. Kempelen may have been particularly sensitive to criticism considering that he had earlier constructed and publicly demonstrated a chess-playing automaton that was in fact a hoax (cf., Carroll, 1975). Many observers initially assumed that his talking machine was merely a fake as well.

Kempelen’s lasting contribution to phonetics is his prodigious written account of not only the design of his talking machine, but also the nature of speech and language in general (von Kempelen, 1791). In “On the Mechanism of Human Speech” [English translation], he describes the experiments that consumed more than 20 years and clearly showed the significance of using models of speech production and sound generation to study and analyze human speech. This work motivated much subsequent research on speech production, and to this day still guides the construction of replicas of his talking machine for pedagogical purposes (cf., Trouvain and Brackhane, 2011).

One person particularly inspired by von Kempelen’s work was, in fact, Joseph Faber. According to a biographical sketch (Wurzbach, 1856), while recovering from a serious illness in about 1815, Faber happened onto a copy of “On the Mechanism of Human Speech” and became consumed with the idea of building a talking machine. Of course, he built not a replica of von Kempelen’s machine, but one with a significantly advanced system of controlling the mechanical simulation of speech production. As remarkable as Faber’s machine seems to have been regarded by some observers, Faber was indeed late to the party, so to speak, for the science of voice and speech had by the early 1800s already shifted into the realm of physical acoustics. Robert Willis, a professor of mechanics at Cambridge University, was dismayed by both Kratzenstein’s and von Kempelen’s reliance on trial-and-error methods in building their talking machines, rather than acoustic theory. He took them to task, along with most others working in phonetics at the time, in his 1829 essay titled “On the Vowel Sounds, and on Reed Organ-Pipes.” The essay begins:

The generality of writers who have treated on the vowel sounds appear never to have looked beyond the vocal organs for their origin. Apparently assuming the actual forms of these organs to be essential to their production, they have contented themselves with describing with minute precision the relative positions of the tongue, palate and teeth peculiar to each vowel, or with giving accurate measurements of the corresponding separation of the lips, and of the tongue and uvula, considering vowels in fact more in the light of physiological functions of the human body than as a branch of acoustics.

(Willis, 1829, p. 231)

Willis laid out a set of experiments in which he would investigate vowel production by deliberately neglecting the organs of speech. He built reed-driven organ pipes whose lengths could be increased or decreased with a telescopic mechanism, and then determined that an entire series of vowels could be generated with changes in tube length and reeds with different vibrational frequencies. Wheatstone (1837) later pointed out that Willis had essentially devised an acoustic system that, by altering tube length, and hence the frequencies of the tube resonances, allowed for selective enhancement of harmonic components of the vibrating reed. Wheatstone further noted that multiple resonances are exactly what is produced by the “cavity of the mouth,” and so the same effect occurs during speech production but with a nonuniformly shaped tube.

Understanding speech as a pattern of spectral components became a major focus of acousticians studying speech communication for much of the 19th century and the very early part of the 20th century. As a result, developments of machines to produce speech sounds were also largely based on some form of spectral addition, with little or no reference to the human speech organs. For example, in 1859 the German scientist Hermann Helmholtz devised an electromagnetic system for maintaining the vibration of a set of eight or more tuning forks, each variably coupled to a resonating chamber to control amplitude (Helmholtz, 1859, 1875). With careful choice of frequencies and amplitude settings he demonstrated the artificial generation of five different vowels. Rudolph Koenig, a well-known acoustical instrument maker in 1800s, improved on Helmholtz’s design and produced commercial versions that were sold to interested clients (Pantalony, 2004). Koenig was also a key figure in emerging technology that allowed for recording and visualization of sound waves. His invention of the phonoautograph with Edouard-Léon Scott in 1859 transformed sound via a receiving cone, diaphragm, and stylus into a pressure waveform etched on smoked paper rotating about a cylinder. A few years later he introduced an alternative instrument in which a flame would flicker in response to a sound, and the movements of flame were captured on a rotating mirror, again producing a visualization of the sound as a waveform (Koenig, 1873).

These approaches were precursors to a device called the “phonodeik” that would be later developed at the Case School of Applied Science by Dayton Miller (1909) who eventually used it to study waveforms of sounds produced by musical instruments and human vowels. In a publication documenting several lectures given at the Lowell Institute in 1914, Miller (1916) describes both the analysis of sound based on photographic representations of waveforms produced by the phonodeik, as well as intricate machines that could generate complex waveforms by adding together sinusoidal components and display the final product graphically so that it might be compared to those waveforms captured with the phonodeik. Miller referred to this latter process as harmonic synthesis, a term commonly used to refer to building complex waveforms from basic sinusoidal elements. It is, however, the first instance of the word “synthesis” in the present chapter. This was deliberate to remain true to the original references. Nowhere in the literature on Kratzenstein, von Kempelen, Wheatstone, Faber, Willis, or Helmholtz does “synthesis” or “speech synthesis” appear. Their devices were variously referred to as talking machines, automatons, or simply systems that generated artificial speech. Miller’s use of synthesis in relation to human vowels seems to have had the effect of labeling any future system that produces artificial speech, regardless of the theory on which it is based, a speech synthesizer.

Interestingly, the waveform synthesis described by Miller was not actually synthesis of sound, but rather synthes...