Voice Technologies for Speech Reconstruction and Enhancement
eBook - ePub

Voice Technologies for Speech Reconstruction and Enhancement

  1. 228 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Voice Technologies for Speech Reconstruction and Enhancement

About this book

The book explores new ways to reconstruct and enhance speech that is compromised by various neuro-motor disorders – collectively known as "dysarthria." The authors address some of the extant lacunae in speech research of dysarthric conditions: they show how new methods can improve speaker recognition when speech is impaired due to developmental or acquired pathologies; they present a novel multi-dimensional approach to help the speech system both assess dysarthric speech and to perform intelligibility improvement of the impaired speech; they display well-performing software solutions for developmental and acquired speech impairments, and for vocal injuries; and they examine non-acoustic signals and muted nonverbal sounds in relation to audible speech conversion.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Voice Technologies for Speech Reconstruction and Enhancement by Hemant A. Patil, Amy Neustein, Hemant A. Patil,Amy Neustein in PDF and/or ePUB format, as well as other popular books in Technology & Engineering & Computer Science General. We have over one million books available in our catalogue for you to explore.

Part I: Comparative analysis of methods for speaker identification, speech recognition, and intelligibility modification in the dysarthric speaker population

1 State-of-the-art speaker recognition methods applied to speakers with dysarthria

Mohammed Senoussaoui
École de technologie supĂ©rieure (ÉTS) and Fluent.ai, Canada
Milton O. Saria-Paja
Universidad Santiago de Cali
Patrick Cardinal
Tiago H. Falk
Institut National de la Recherche Scientifique, Centre Énergie, MatĂ©riaux, TĂ©lĂ©communications and MuSAE Lab, Canada
François Michaud
Université Sherbrooke, Canada

Abstract

Speech-based biometrics is one of the most effective ways for identity management and one of the preferred methods by users and companies given its flexibility, speed and reduced cost. Current state-of-the-art speaker recognition systems are known to be strongly dependent on the condition of the speech material provided as input and can be affected by unexpected variability presented during testing, such as environmental noise, changes in vocal effort or pathological speech due to speech and/or voice disorders. In this chapter, we are particularly interested in understanding the effects of dysarthric speech on automatic speaker identification performance. We explore several state-of-the-art feature representations, including i-vectors, bottleneck neural-network-based features, as well as a covariance-based feature representation. High-level features, such as i-vectors and covariance-based features, are built on top of four different low-level presentations of dysarthric/controlled speech signal. When evaluated on TORGO and NEMOURS databases, our best single system accuracy was 98.7%, thus outperforming results previously reported for these databases.
Keywords: speaker recognition, dysarthria, i-vectors, covariance features, bottleneck features,

1.1 Introduction

Human speech is a natural, unique, complex and flexible mode of communication that conveys phonological, morphological, syntactic and semantic information provided within the utterance [1]. It also conveys traits related to identity, age, emotional or health states, to name a few [2, 3]. This information has been useful across a number of domains; for example, automatic speech recognition (ASR) has opened doors for speech to be used as a reliable human–machine interface [4, 5]. Advances in speaker recognition (SR) technologies, in turn, have allowed humans to use their voice to, for example, authenticate themselves into their bank’s automated phone system [6]. Under most circumstances, speech-enabled applications have been developed to work in clean environments and assume clear and normal adult speech. These assumptions and conditions, however, are difficult to satisfy in many real-world environments. Speech production requires integrity and integration of numerous neurological and musculoskeletal activities. Many factors, such as accidents or diseases, however, can affect the quality and intelligibility of produced speech [7, 8]. These modifications are usually referred to as speech disorders, and their effects can be observed in individuals of varying age groups for different causes. Dysarthria is a particular speech motor disorder caused by damage to the nervous system and characterized by a substantive decrease in speech intelligibility [9, 10, 11]. Reduced intelligibility can negatively influence an individual’s life in several ways, including social interactions, access to employment, education or interaction with automated systems [12]. Many speech-enabled applications have thrived due to the recent proliferation of mobile devices. Notwithstanding, while the ubiquity of smartphones has opened a pathway for new speech applications, it is imperative that the performance of such systems be tested for pathological speech, such that corrective measures can be taken, if needed.
At present, most of the research conducted in the digital speech-processing domain applied to speech disorders has focused mainly on ASR, enhancement and speech intelligibility assessment [13, 14, 15]. Other emerging speech applications have yet to be explored, such as SR, language identification, emotion recognition, among others. Particularly, in this chapter the SR problem is of special interest. Such technologies are burgeoning for identity management as they eliminate the need for personal identification numbers, passwords and security questions [16]. In this regard, Gaussian mixture models (GMMs) combined with mel-frequency cepstral coefficients (MFCC) as feature vectors, known as the GMM-MFCC paradigm, were for many years the dominant approach for text-independent SR [6]. Over the last decade, the i-vector feature representation [17] has become the state-of-the-art for text-independent SR, as well as for many other speech-related fields such as language recognition [18, 19, 20, 21] and ASR [22]. Recently, the i-vector framework was also successfully applied to objective dysarthria intelligibility assessment [23].
Covariance-based features, on the other hand, were first proposed for the task of object detection [24] and were further explored for object detection and tracking [25, 2...

Table of contents

  1. Title Page
  2. Copyright
  3. Contents
  4. Foreword
  5. Acknowledgments
  6. Introduction
  7. Part I: Comparative analysis of methods for speaker identification, speech recognition, and intelligibility modification in the dysarthric speaker population
  8. Part II: New approaches to speech reconstruction and enhancement via conversion of non-acoustic signals
  9. Part III: Use of novel speech diagnostic and therapeutic intervention software for speech enhancement and rehabilitation