eBook - ePub

Speech and Language Technology for Language Disorders

Name: Speech and Language Technology for Language Disorders
Author: Katharine Beals, Deborah Dahl, Ruth Fink, Marcia Linebarger

Katharine Beals, Deborah Dahl, Ruth Fink, Marcia Linebarger

225 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Speech and Language Technology for Language Disorders

Katharine Beals, Deborah Dahl, Ruth Fink, Marcia Linebarger

Book details

Book preview

Table of contents

Citations

About This Book

This book draws on the recent remarkable advances in speech and language processing: advances that have moved speech technology beyond basic applications such as medical dictation and telephone self-service to increasingly sophisticated and clinically significant applications aimed at complex speech and language disorders. The book provides an introduction to the basic elements of speech and natural language processing technology, and illustrates their clinical potential by reviewing speech technology software currently in use for disorders such as autism and aphasia. The discussion is informed by the authors' own experiences in developing and investigating speech technology applications for these populations. Topics include detailed examples of speech and language technologies in both remediative and assistive applications, overviews of a number of current applications, and a checklist of criteria for selecting the most appropriate applications for particular user needs.

This book will be of benefit to four audiences: application developers who are looking to apply these technologies; clinicians who are looking for software that may be of value to their clients; students of speech-language pathology and application development; and finally, people with speech and language disorders and their friends and family members.

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is Speech and Language Technology for Language Disorders an online PDF/ePUB?

Yes, you can access Speech and Language Technology for Language Disorders by Katharine Beals, Deborah Dahl, Ruth Fink, Marcia Linebarger in PDF and/or ePUB format, as well as other popular books in Technik & Maschinenbau & Elektrotechnik & Telekommunikation. We have over one million books available in our catalogue for you to explore.

Information

Publisher

De Gruyter

Year

2015

ISBN

9781614519256

Edition

Topic

Technik & Maschinenbau

Subtopic

Elektrotechnik & Telekommunikation

Deborah Dahl

1Overview of speech and language technologies

Abstract: This chapter provides a technical overview and description of the state of the art for current speech and language processing technologies. It focuses on the technologies that have been particularly useful in assistive and remediative applications for people with speech and language disorders. The major technologies discussed include speech recognition, natural language processing, dialog management, and text to speech. The chapter also briefly reviews other related technologies such as avatars, text simplification and natural language generation.

1.1Introduction to speech and language technologies

Speech and language technologies are technologies that allow computers to perform some of the functions of human linguistic communication – including recognizing and understanding speech, reading text out loud, and engaging in a conversation. Although human abilities to communicate with each other far outstrip the current state of the art in speech and language technologies, the technologies are progressing rapidly and are certainly suitable for application to specific, well-defined problems. There will not be a single, all-encompassing, spoken language understanding system that can be applied to every situation any time soon, but if we look at specific contexts and needs, there very well may be ways that these technologies in their current state can be extremely helpful.

The technologies that will be discussed in this chapter do not in most cases serve those with speech and language disorders directly. Rather, these technologies are more typically deployed to supply speech- and language-processing capabilities as part of applications that, in turn, are specifically dedicated to these populations.

The entire field of speech and language technologies is very broad and can be broken down into many very specialized technologies. We will focus here on the subset of speech and language technologies that show particular promise for use in addressing language disorders. The main focus will be on speech recognition (sometimes also called speech-to-text), natural language understanding, and dialog systems. However, other emerging technologies such as text simplification and natural language generation can potentially play a role in addressing speech and language disorders, so these technologies will also be mentioned briefly.

We will primarily be concerned with applications of the technologies in assistive and remediation situations. However, some of the technologies can also be applied toward other goals, for example, automatic assessment of users’ capabilities and automatic logging and record keeping for clinical and research purposes. We will also touch on these types of applications.

We will focus in this chapter on the technologies themselves, regardless of how they are used in specific applications or research projects, noting that in almost every case, basic technologies will be combined with other software (and hardware) to create specific applications.

Because speech and language technologies are modeled on human capabilities, it is useful to discuss them in the context of a complete system that models a human conversational participant; that is, an interactive dialog system. Conversations between people go back and forth between the conversational participants, each participant speaking and listening at different times. This back-and-forth pattern is called turn-taking, and each speaker’s contribution is called a turn. In the majority of normal conversations, each turn is more or less related to the previous speaker’s turn. Thus, participating in a human-human conversation requires skills in listening, understanding, deciding what to say, composing an appropriate response, and speaking. These skills are mirrored in the technologies that are used to build spoken dialog systems: speech recognition, natural language understanding, dialog management, natural language generation, and text-to-speech (TTS). For people with speech and language disorders, then, these separate technologies can potentially be applied to compensate for disorders that affect each of these skills.

Figure 1.1 is an example of a complete interactive dialog system. A user speaks or types to the system, then the natural language understanding component processes the user’s input and represents the input in a structured way so that it can be used by a computer. The dialog management component acts on the user’s input and decides what to do next. The next action might be some kind of response to the user, interaction with the user’s environment, or feedback to the user on their input. Unlike conversations between people, where the responses will almost always be linguistic, responses in an interactive dialog system can also be in the form of displayed text or graphics.

Fig. 1.1: Complete interactive dialog system.

As we will see in the rest of this book, these technologies can be mixed and matched in a variety of ways in different applications to address different remediation or assistive goals. As an example, Fig. 1.2 shows a simpler version of a spoken dialog system, designed to provide the user with feedback on their speech or on individual spoken words. It does not attempt to provide the user with feedback on language, so it does not require a natural language understanding component. Rather, speech is recognized, and the recognized speech is sent to the dialog management component, which then provides the user with feedback in the form of text audio output and graphical output. This system describes the general structure of MossTalk Words, discussed in Chapter 8.

Fig. 1.2: Speech/lexical feedback components.

Fig. 1.3: System for language/grammar feedback.

As another example, a system designed to provide the user only with feedback on their language would look more like the system shown in Fig. 1.3. Here the user speaks or types to the system with the intention of producing a more or less complete sentence. This kind of system is focused on providing feedback to the user on their language; although speech would be an option for input, typed input is also possible with this kind of system, if that is appropriate for the application and for the users. GrammarTrainer, an application for helping users with autism improve their grammar, discussed in Chapter 3, is a system of this kind. Users interact with GrammarTrainer with typed input.

Another system with a similar organization is the Aphasia Therapy System discussed in Chapter 9, for users with aphasia, which analyzes users spoken language and provides detailed feedback on their productions.

Another type of organization is shown in Fig. 1.4. This system allows the user to record short pieces of speech and assemble them into longer spoken sentences or series of sentences. An example of this type of system is SentenceShaper®, discussed in Chapter 8. The dialog manager in this case is simply the software that reacts to the user’s commands to record and play back speech at different levels.

The next few sections will discuss in more detail the individual technologies that comprise these systems. This material can be treated as background reading. It is useful in understanding the technologies that can be applied to speech and language disorders and their limitations, especially for developers, but readers can skip over the rest of this chapter if they are not interested in the details of the underlying technologies.

Fig. 1.4: System for user-initiated control and playback of user utterances.

1.2Speech recognition

1.2.1What is speech recognition?

Speech recognition is the technology that enables a computer to turn speech into written language. It is sometimes called “speech-to-text”. More technically, speech recognition is referred to as automatic speech recognition to distinguish it from human speech recognition. One way to think of a speech recognizer is as the software counterpart of a human stenographer or transcriptionist. The speech recognizer simply records the words that it hears, without attempting to understand them.

Speech recognition starts with capturing speech and converting it from sound, which is physically a sequence of rapid changes in air pressure, into an electrical signal that mirrors the sound, the waveform, through the use of a microphone. Perhaps surprisingly, the waveforms for what we perceive as a sequence of words do not include physical gaps corresponding to what we perceive as word boundaries. There are rarely silences between words in actual speech, and conversely, there can be silences in the middle of words that we do not perceive as silence. In addition, the same sounds can be spoken in many different ways, even though they sound to human listeners like the same sound. In addition to the speaker’s words, many additional factors can affect the actual physical sounds of speech. These include the speaker’s accent, the speaker’s age, how clearly the speech is articulated, how rapid it is, and whether the conversation is casual or formal. In addition, in the real world, speech will inevitably be mixed in with other sounds in the environment, such as noise, music, and speech from other people. One of the most difficult problems today in speech recognition research is separating the speech that a system is interested in from other sounds in the environment, particularly from other speech. For all of these reasons, the technologies behind the process of converting sounds to written words are very complex.

Fig. 1.5: Speech waveform for the word “speech” spoken three times.

As an example, Fig. 1.5 shows the waveform for the word “speech” spoken three times, with the sounds mapped to the parts of the waveform to which they correspond. Distance from the middle indicates the amount of energy in the signal at that point. Note that none of these look exactly the same, even though they were spoken by the same person at almost the same time. We can also see that the “ch” at the end of each “speech” merges into the “s” at the beginning of the next word without any actual silence (as indicated by a flat line in the waveform). Also note that the “p’s” and the “ch’s” each contain a brief introductory silence, pointed to by the arrow for the first “p”, that we do not hear as a silence.

The following discussion presents a very high level overview of how today’s speech recognition technology works. Speech recognition is the process of trying to match waveforms, as shown in Fig. 1.5, which are highly variable, to the sounds and words of a language. Because of the variable nature of the waveforms, the process of speech recognition is heavily statistical, relying on large amounts of previously transcribed speech, which provides examples of how sounds (the signal) match up to the words of a language. Basically, the recognizer is trying to find the best match between the signal and the words of the language, but mistakes, or misrecognitions, are very possible, particularly when the speech occurs under challenging conditions that make it harder to hear.

The next task in speech recognition is to analyze the waveform into its component frequencies. Speech, like all sounds, can be broken down into a combination of frequencies, referring to different rates of vibration in the sound. Frequencies are measured in terms of cycles per second, or hertz (Hz). We perceive lower frequencies as lower-pitched sounds and higher frequencies as higher-pitched sounds. The energy present in the signal at different frequencies is referred to as the spectrum.

The spectrum is more useful in speech recognition than the waveform because it shows more clearly the amount of energy present at different frequencies at each point in time. This energy is very diagnostic of the specific sp...

Citation styles for Speech and Language Technology for Language Disorders

APA 6 Citation

Beals, K., Dahl, D., Fink, R., & Linebarger, M. (2015). Speech and Language Technology for Language Disorders (1st ed.). De Gruyter. Retrieved from https://www.perlego.com/book/608669/speech-and-language-technology-for-language-disorders-pdf (Original work published 2015)

Chicago Citation

Beals, Katharine, Deborah Dahl, Ruth Fink, and Marcia Linebarger. (2015) 2015. Speech and Language Technology for Language Disorders. 1st ed. De Gruyter. https://www.perlego.com/book/608669/speech-and-language-technology-for-language-disorders-pdf.

Harvard Citation

Beals, K. et al. (2015) Speech and Language Technology for Language Disorders. 1st edn. De Gruyter. Available at: https://www.perlego.com/book/608669/speech-and-language-technology-for-language-disorders-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Beals, Katharine et al. Speech and Language Technology for Language Disorders. 1st ed. De Gruyter, 2015. Web. 14 Oct. 2022.