Computational Linguistics, Speech and Image Processing for Arabic Language
eBook - ePub

Computational Linguistics, Speech and Image Processing for Arabic Language

  1. 288 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Computational Linguistics, Speech and Image Processing for Arabic Language

About this book

This book encompasses a collection of topics covering recent advances that are important to the Arabic language in areas of natural language processing, speech and image analysis. This book presents state-of-the-art reviews and fundamentals as well as applications and recent innovations.

The book chapters by top researchers present basic concepts and challenges for the Arabic language in linguistic processing, handwritten recognition, document analysis, text classification and speech processing. In addition, it reports on selected applications in sentiment analysis, annotation, text summarization, speech and font analysis, word recognition and spotting and question answering.

Moreover, it highlights and introduces some novel applications in vital areas for the Arabic language. The book is therefore a useful resource for young researchers who are interested in the Arabic language and are still developing their fundamentals and skills in this area. It is also interesting for scientists who wish to keep track of the most recent research directions and advances in this area.


Contents:

  • Arabic Speech Recognition: Challenges and State of the Art (Sherif Mahdy Abdou and Abdullah M Moussa)
  • Introduction to Arabic Computational Linguistics (Mohsen Rashwan)
  • Challenges in Arabic Natural Language Processing (Khaled Shaalan, Sanjeera Siddiqui, Manar Alkhatib and Azza Abdel Monem)
  • Arabic Recognition Based on Statistical Methods (A Belaïd and A Kacem Echi)
  • Arabic Word Spotting Approaches and Techniques (Muna Khayyat, Louisa Lam and Ching Y Suen)
  • A'rib — A Tool to Facilitate School Children's Ability to Analyze Arabic Sentences Syntactically (Mashael Almedlej and Aqil M Azmi)
  • Semi-Automatic Data Annotation, POS Tagging and Mildly Context-Sensitive Disambiguation: The eXtended Revised AraMorph (XRAM) (Giuliano Lancioni, Laura Garofalo, Raoul Villano, Francesca Romana Romani, Marta Campanelli, Ilaria Cicola, Ivana Pepe, Valeria Pettinari and Simona Olivieri)
  • WeightedNileULex: A Scored Arabic Sentiment Lexicon for Improved Sentiment Analysis (Samhaa R El-Beltagy)
  • Islamic Fatwa Request Routing via Hierarchical Multi-Label Arabic Text Categorization (Reda Zayed, Mohamed Farouk and Hesham Hefny)
  • Arabic and English Typeface Personas (Shima Nikfal and Ching Y Suen)
  • End-to-End Lexicon Free Arabic Speech Recognition Using Recurrent Neural Networks (Abdelrahman Ahmedy, Yasser Hifny, Khaled Shaalan and Sergio Toral)
  • Bio-Inspired Optimization Algorithms for Improving Artificial Neural Networks: A Case Study on Handwritten Letter Recognition (Ahmed A Ewees and Ahmed T Sahlol)


Readership: Students and researchers in Arabic document and speech/image/text processing.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Computational Linguistics, Speech and Image Processing for Arabic Language by Neamat El Gayar, Ching Y Suen in PDF and/or ePUB format, as well as other popular books in Computer Science & Computer Science General. We have over one million books available in our catalogue for you to explore.

Information

Chapter 1

Arabic Speech Recognition: Challenges and State of the Art

Sherif Mahdy Abdou1 and Abdullah M. Moussa2
1Faculty of Computers and Information,
Cairo University, Giza 12613, Egypt
[email protected]
2Faculty of Engineering,
Cairo University, Giza 12613, Egypt
[email protected]
The Arabic language has many features such as the phonology and the syntax that make it an easy language for developing automatic speech recognition systems. Many standard techniques for acoustic and language modeling such as context dependent acoustic models and n-gram language models can be easily applied to Arabic. Some aspects of the Arabic language such as the nearly one-to-one letter-to-phone correspondence make the construction of the pronunciation lexicon even easier than in other languages. The most difficult challenges in developing speech recognition systems for Arabic are the dominance of non-diacritized text material, the several dialects, and the morphological complexity. In this chapter, we review the efforts that have been done to handle the challenges of the Arabic language for developing automatic speech recognition systems. This includes methods for automatic generation for the diacritics of the Arabic text and word pronunciation disambiguation. We also review the used approaches for handling the limited speech and text resources of the different Arabic dialects. Finally, we review the approaches used to deal with the high degree of affixation, derivation that contributes to the explosion of different word forms in Arabic.

1.Introduction

Speech recognition is the ability of a machine or program to identify words and phrases in spoken language and convert them to a machine-readable format. The last decade has witnessed substantial advances in speech recognition technology, which when combined with the increase in computational power and storage capacity, has resulted in a variety of commercial products already on the market.
Arabic language is the largest still living Semitic language in terms of the number of speakers. Around 300 million people use Arabic as their first native language, and it is the fourth most widely used language based on the number of first language speakers.
Many serious efforts have been done to develop Arabic speech recognition systems.1,2,3 Many aspects of Arabic, such as the phonology and the syntax, do not present problems for Automatic Speech Recognition (ASR). Standard, language-independent techniques for acoustic and pronunciation modeling, such as context-dependent phones, can easily be applied to model the acoustic-phonetic properties of Arabic. Some aspects of recognizer training are even easier than in other languages, in particular the task of constructing a pronunciation lexicon since there is a nearly one-to-one letter-to-phone correspondence. The most difficult problems in developing high-accuracy speech recognition systems for Arabic are the predominance of non-diacritized text material, the enormous dialectal variety, and the morphological complexity.
In the following sections of this chapter we start by describing the main components of ASR systems and major approaches that have been introduced to develop each of them. Then, we review the previous efforts for developing Arabic ASR systems. Finally, we discuss the major challenges of Arabic ASR and the proposed solutions to overcome them with a summary of state of art systems performance.

2.The Automatic Speech Recognition System Components

The goal of the ASR system is to find the most probable sequence of words w = (w1, w2,…) belonging to a fixed vocabulary given some set of acoustic observations X = (x1, x2,…, xT). Following the Bayesian approach applied to ASR as shown in Ref. 4, the best estimation for the word sequence can be given by:
images
To generate an output, the speech recognizer has basically to perform the following operations as shown in Fig. 1:
Extract acoustic observations (features) out of the spoken utterance.
Estimate P(W) — the probability of individual word sequence to happen, regardless of the acoustic observations. This is named the language model.
Estimate P(X/W) — the likelihood that the particular set of features originates from a certain sequence of words. This includes both the acoustic model and the pronunciation lexicon. The latter is perhaps the only language-dependent component of an ASR system.
Find word sequence that delivers the maximum of (1). This is referred to as the search or decoding.
images
Fig. 1.The ASR system main architecture.
The two terms P(W) and P(X/W) and the maximization operation constitute the basic ingredients of a speech recognition system. The goal is to determine the best word sequence given a speech input X. Actually, X is not the speech input but a set of features derived from the speech. The Mel Frequency Cepstrum Coefficients (MFCC) and Perceptual Linear Prediction (PLP) are the most widely used. The acoustic and language models and the search operation are discussed below.

2.1.Pronunciation lexicon

The pronunciation lexicon is basically a list where each word in the vocabulary is mapped into a sequence (or multiple sequences) of phonemes. This allows modeling a large number of words using a fixed number of phonemes. Sometimes whole word models are used. In this case the pronunciation lexicon will be a trivial one. The pronunciation lexicon is language-dependent and for a large vocabulary (several thousand words) might require a large effort. We will discuss this for Arabic in the next sections.

2.2.Acoustic model

The most popular acoustic models are the so called Hidden Markov Models (HMM). Each phoneme (unit in general) is modeled using an HMM. An HMM4 consists of a set of states, transitions, and output distributions as shown in Fig. 2.
images
Fig. 2.HMM Phone Model.
The HMM states are associated with emission probability density functions. These densities are usually given by a mixture of diagonal covariance Gaussians as expressed in equation (2):
images
where j ranges over the number of Gaussian densities in the mixture of state Si. The expression N(:) is the value of the chosen component Gaussian density function for feature vector x. The parameters of the model (state transition probabilities and output distribution parameters e.g. means and variances of a Gaussian) are automatically estimated from training data. Usually, using only one model per phone is not accurate enough and usually several models are trained for each phone depending on its context. For example, tri-phone uses a separate model depending on the immediate left and right contexts of a phone. For example, tri-phone A with left context b and right context n (referred to as /b-A-n/) has a different model than tri-phone A with left context t and right context m (referred to as /t-A-m/). For a total number of phones P, there will be P3 tri-phones, and for N states/model, there will be N P3 states in total. The idea can be generalized to larger context e.g. quinphones. This typically leads to a large number of parameters. In practice, context-dependent phones are clustered to reduce the number of parameters. Perhaps the most important aspect in designing a speech recognition system is finding the right number of states for the given amount of training data. Extensive research has been done to address this point. Methods vary from very simple phonetic rules to data driven clustering. Perhaps the most popular technique used is the decision tree clustering.5 In this method, both context questions and a likelihood metric are used to cluster the data for each phonetic state as shown in Fig. 3. The depth of the tree can be used to tradeoff accuracy versus robustness.
Once the context-dependent states are clustered, it remains to assign a probability distribution to each clustered state. Gaussian mixtures are the most popular choice in modern speech recognition systems. The parameters of the Gaussians are estimated to maximize the likelihood of the training data (the so-called maximum likelihood (ML) estimation). For HMMs ML, estimation is achieved by the so-called forward backward or Baum-Welch algorithm.
images
Fig. 3.Decision tree for classifying the second state of K-triphone HMM.
Although ML remained as the preferred training method for a long time. Recently, discriminative training techniques took over. It was demonstrated that they can lead to superior performance. However, this comes at the expense of a more complex training procedure.6 There are several discriminative training criteria such as Maximum Mutual Information (MMI), Minimum Classification Error (MCE), Minimum - Phone Error (MPE) and most recently Maximum Margin methods. All these different techniques share the idea of using the correct transcription and a set of competing hypotheses. They estimate the model parameters to “discriminate” the correct versus competing hypotheses. The competing hypotheses are usually obtained from a lattice which in turn requires the decoding of the training data. Model estimation is most widely done using the so-called extended Baum-Welch estimation (EBW).7
Recently, a better acoustic model was introduced that is a hybrid of HMM and Deep Neural Networks (DNN). The Gaussian Mixtures Models (GMM) are replaced with neural networks with deep number of hidden layers as shown in Fig. 4.
images
Fig. 4.HMM-DNN Model.
The DNNs have a higher modeling capacity per parameter than GMMs and they also have a fairly efficient training procedure that combines unsupervised generative learning for feature discovery with a subsequent stage of supervised learning that fine tunes the features to optimize discrimination. The Context-Dependent (CD)-DNN-HMM hybrid model as shown in Ref. 8 has been successfully applied to large vocabulary speech recognition tasks and can cut word error rate by up to one third on the challenging conversational speech transcription tasks compared to the discriminatively trained conventional CD-GMM-HMM systems.
While the above summarizes how to train models, it remains to discuss the training data. Of course, using more data allows using larger and hence more accurate models leading to better performance. However, data collection and transcription is a tedious and costly process. For this reason, a technique called unsupervised or better lightly supervised training is becoming very popular. First, several hundred hours of speech are used to train a model. The model together with an appropriate confidence measure can then be used to automatically transcribe thousands of hours of data. The new data can then be used to train a larger model. All the above techniques (and more) are implemented in the so-cal...

Table of contents

  1. Cover
  2. Halftitle
  3. Title
  4. Copyright
  5. Contents
  6. Preface
  7. Chapter 1: Arabic Speech Recognition: Challenges and State of the Art
  8. Chapter 2: Introduction to Arabic Computational Linguistics
  9. Chapter 3: Challenges in Arabic Natural Language Processing
  10. Chapter 4: Arabic Recognition Based on Statistical Methods
  11. Chapter 5: Arabic Word Spotting Approaches and Techniques
  12. Chapter 6: A‘rib — A Tool to Facilitate School Children’s Ability to Analyze Arabic Sentences Syntactically
  13. Chapter 7: Semi-Automatic Data Annotation, POS Tagging and Mildly Context-Sensitive Disambiguation: The eXtended Revised AraMorph (XRAM)
  14. Chapter 8: WeightedNileULex: A Scored Arabic Sentiment Lexicon for Improved Sentiment Analysis
  15. Chapter 9: Islamic Fatwa Request Routing via Hierarchical Multi-Label Arabic Text Categorization
  16. Chapter 10: Arabic and English Typeface Personas
  17. Chapter 11: End-to-End Lexicon Free Arabic Speech Recognition Using Recurrent Neural Networks
  18. Chapter 12: Bio-Inspired Optimization Algorithms for Improving Artificial Neural Networks: A Case Study on Handwritten Letter Recognition
  19. Index