
eBook - ePub
The Speech Processing Lexicon
Neurocognitive and Behavioural Approaches
- 266 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
eBook - ePub
The Speech Processing Lexicon
Neurocognitive and Behavioural Approaches
About this book
In this book, some of today's leading neurolinguists and psycholinguists provide insight into the nature of phonological processing using behavioural measures, computational modeling, EEG and fMRI. The essays cover a range of topics including categorization, acoustic variability and invariance, underspecification, talker-specificity and machine learning, focusing on the acoustics, perception, acquisition and neural representation of speech.
Frequently asked questions
Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
- Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
- Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access The Speech Processing Lexicon by Aditi Lahiri, Sandra Kotzor, Aditi Lahiri,Sandra Kotzor in PDF and/or ePUB format, as well as other popular books in Languages & Linguistics & Linguistics. We have over one million books available in our catalogue for you to explore.
Information
Vipul Arora and Henning Reetz
Automatic speech recognition: What phonology can offer
Vipul Arora, Faculty of Linguistics, Philology and Phonetics, University of Oxford
Henning Reetz, Institut für Phonetik, Goethe University Frankfurt
Abstract: This chapter presents phonological features as the underlying representation of speech for the purpose of automatic speech recognition (ASR), instead of phones (or phonemes), which are typically used for this purpose. Phonological features offer a number of advantages. Firstly, they can efficiently handle the pronunciation variability found in languages. Secondly, these features form natural classes to represent speech universally, hence they are capable of providing better ways to transfer various models, involved in ASR, across different languages and dialects. Moreover, the ubiquity of the perceptual properties of phonological features is supported by various neuro-linguistic experiments and language studies for different languages of the world. Thus, phonological features can provide a principled way of ASR, thereby reducing the amount of training data and computational resources required.
The main challenge is to develop mathematical models to reliably detect these features from the speech signal, and to incorporate them into ASR systems. Towards this end, we describe here some of our implementations. Firstly, we present a digit recognition system that includes detecting the features with the help of neural networks and a rule-based feature-to-phoneme mapping. Secondly, we describe a deep neural networks based method to extract the features from speech signals. This method improves the detection accuracy by using deep learning. Thirdly, we present a deep neural network based ASR system which detects features and maps them to phonemes using statistical models. This system performs at par with state-of-the-art ASR systems for the task of phoneme recognition.
1Introduction
Human faculty of speech has allured philosophers, linguists and engineers of all times. The modern devices of recording and reproducing sound trace their roots back to phonograph invented by Edison in 1877. From there, the technology evolved and gave rise to interest in audio processing, leading further to speech analysis and recognition. Spectral analysis (Koenig et al., 1946) and linear predictive coding (Markel & Gray, 1976) laid much of the foundations of visualising and representing the acoustics of speech. Around this time the phoneticians and phonologists developed insights into speech acoustics, and came up with ways of characterising sound units for prospectively all spoken languages (Chiba & Kajiyama, 1941; Jakobson et al., 1951; Fant, 1960).
The Advanced Research Projects Agency (ARPA) of the Department of Defense financed in the 1970s the ARPA Speech Understanding Project to boost the development of Automatic Speech Understanding (ASU) technology. The goal was to convert spoken text input into an appropriate computer reaction. Note that Automatic Speech Understanding is different from Automatic Speech Recognition (ASR). The later transcribes an acoustic speech signal into a written text, whereas the first gives an appropriate reaction by a machine, for example, to retrieve a document from a database. Klatt noted in his report (Klatt, 1977: 1353) that the best performing Harpy3 system’s phonetic transcription performance was worse than the other systems in the competition. This is not surprising, since Harpy used only acoustic spectrum matching techniques without a phonetic or phonemic inventory. This was possible since it did not try to ‘transcribe’ what was said into phonetic labels but used a stochastical Markov chain to match spectral patterns with its internal network of patterns. Its success was mostly based on the restricted syntax of its application, which was decoded in an internal network of possible phrases, and an efficient beam-search method. That is, the system did not try to transcribe every word that was uttered but rather found matching parts in its network to generate an appropriate reaction by the system. Another reason for its good performance was due to its avoidance of early ‘hard’ decisions on sounds or words. Additionally, the central processing unit was a network of essentially simple nodes, who all have the same structure, and did not need any proprietary rules for each sound. The success of this system over the ‘classical’ phonetic-based systems, which try to transcribe speech first and ...
Table of contents
- Cover
- Title Page
- Copyright
- Table of Contents
- Introduction
- Phonetic categories and phonological features: Evidence from the cognitive neuroscience of language
- On invariance: Acoustic input meets listener expectations
- The invariance problem in the acquisition of non-native phonetic contrasts: From instances to categories
- Symmetry or asymmetry: Evidence for underspecification in the mental lexicon
- Talker-specificity effects in spoken language processing: Now you see them, now you don't
- Processing acoustic variability in lexical tone perception
- Flexible and adaptive processes in speech perception
- Foreign accent syndrome: Phonology or phonetics?
- How category learning occurs in adults and children
- Automatic speech recognition: What phonology can offer
- Fluid semantics: Semantic knowledge is experience-based and dynamic
- Subject index