Hands-On Natural Language Processing with Python
eBook - ePub

Hands-On Natural Language Processing with Python

A practical guide to applying deep learning architectures to your NLP applications

  1. 312 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Hands-On Natural Language Processing with Python

A practical guide to applying deep learning architectures to your NLP applications

About this book

Foster your NLP applications with the help of deep learning, NLTK, and TensorFlow

Key Features

  • Weave neural networks into linguistic applications across various platforms
  • Perform NLP tasks and train its models using NLTK and TensorFlow
  • Boost your NLP models with strong deep learning architectures such as CNNs and RNNs

Book Description

Natural language processing (NLP) has found its application in various domains, such as web search, advertisements, and customer services, and with the help of deep learning, we can enhance its performances in these areas. Hands-On Natural Language Processing with Python teaches you how to leverage deep learning models for performing various NLP tasks, along with best practices in dealing with today's NLP challenges.

To begin with, you will understand the core concepts of NLP and deep learning, such as Convolutional Neural Networks (CNNs), recurrent neural networks (RNNs), semantic embedding, Word2vec, and more. You will learn how to perform each and every task of NLP using neural networks, in which you will train and deploy neural networks in your NLP applications. You will get accustomed to using RNNs and CNNs in various application areas, such as text classification and sequence labeling, which are essential in the application of sentiment analysis, customer service chatbots, and anomaly detection. You will be equipped with practical knowledge in order to implement deep learning in your linguistic applications using Python's popular deep learning library, TensorFlow.

By the end of this book, you will be well versed in building deep learning-backed NLP applications, along with overcoming NLP challenges with best practices developed by domain experts.

What you will learn

  • Implement semantic embedding of words to classify and find entities
  • Convert words to vectors by training in order to perform arithmetic operations
  • Train a deep learning model to detect classification of tweets and news
  • Implement a question-answer model with search and RNN models
  • Train models for various text classification datasets using CNN
  • Implement WaveNet a deep generative model for producing a natural-sounding voice
  • Convert voice-to-text and text-to-voice
  • Train a model to convert speech-to-text using DeepSpeech

Who this book is for

Hands-on Natural Language Processing with Python is for you if you are a developer, machine learning or an NLP engineer who wants to build a deep learning application that leverages NLP techniques. This comprehensive guide is also useful for deep learning users who want to extend their deep learning skills in building NLP applications. All you need is the basics of machine learning and Python to enjoy the book.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Hands-On Natural Language Processing with Python by Rajesh Arumugam, Rajalingappaa Shanmugamani in PDF and/or ePUB format, as well as other popular books in Informatique & Traitement du langage naturel. We have over one million books available in our catalogue for you to explore.

Information

Text-to-Speech Using Tacotron

Text-to-speech (TTS) is the act of converting text into intelligible and natural speech. Before we delve into deep learning approaches to handle TTS, we should ask ourselves the following questions: what are TTS systems for? And why do we need them in the first place?
Well, there are many use cases for TTS. One of the most obvious is that it allows blind people to listen to written content. Indeed, Braille-based books, devices, or signs are not always available, and blind people can't always have someone read to them. In the near future, there might be smart glasses that can describe the surrounding environment and read urban signs and text-based indications to their users.
Many people struggle from childhood with learning disabilities like dyslexia. Robust TTS systems can help them on a daily basis, increasing their productivity at school or work, for instance.
Also, related to the area of learning, it is commonly proposed that different individuals have different preferred styles of absorbing knowledge. For instance, there are those that have great visual memory, those that more easily retain information they have heard, and those that rely more on their kinesthetic memory (memory associated with physical movements). TTS systems can help auditory learners take advantage of that particular way of learning.
In our increasingly fast-paced world, multitasking often becomes a necessity. It is not rare to see a person walking in the street and reading some content displayed on their smartphone at the same time. Someone might also be cooking and following recipe instructions on a touchscreen device. But what if the lack of visual attention leads to an accident (in the first scenario), and what if dirty and sticky fingers prevent an aspiring chef from scrolling down to read the rest of the recipe (in the second scenario)? Again, TTS is a natural solution to avoid these inconveniences.
As you can see, TTS applications have the potential to enhance many aspects of our everyday lives.
In this chapter, we will cover the following topics:
  • A quick overview of the field
  • A few recent deep learning approaches for TTS
  • A step-by-step implementation of Tacotron—an end-to-end deep learning model

Overview of text to speech

Here, we will give some general information about TTS algorithms. It is not our ambition to thoroughly tackle the different components of the field, which is quite a complex task and requires cross-domain knowledge in areas like linguistic or signal processing.
We will stick to the following high-level questions: what makes a TTS system good or bad? How is it evaluated? What are some traditional techniques, and why does the field need to move toward deep learning? We will also prepare for the next sections by giving a few basic pieces of information on spectrograms.

Naturalness versus intelligibility

The quality of a TTS system is traditionally assessed through two criteria: naturalness and intelligibility. This is motivated by the fact that people are not only sensitive to what the audio content is, but also to how that content is delivered. Basically, we want a TTS system that can produce clear audio content in a human-like way. More precisely, intelligibility is about the audio quality or cleanness, and naturalness is about communicating the message with the proper pronunciation, timing, and range of emotions.
With a highly intelligible system, it is effortless for the user to distinguish between different words. On the other hand, when intelligibility is low, some words might be confused with others or difficult to identify, and the separation between words might be unclear. In most scenarios, intelligibility is the more important parameter of the two. That is because conveying a clear and unambiguous message to the user is often the priority, whether it sounds natural or not. If a user can't understand the generated audio, it is a failure. Therefore, it is necessary to have a minimum level of intelligibility, before we try to optimize the naturalness of the generated speech.
When a TTS algorithm has a high-level of naturalness, the produced content is so smooth that the user feels like another human being is talking to them. It is hardly possible to tell that the speech was artificially created. On the other hand, a discontinuous, monotonous, and lifeless intonation is typical of unnatural speech.
Note that these are relatively subjective criteria. Therefore, they are not measured with objective metrics. Indeed, because of the nature of the problem, a TTS system can only be evaluated by humans.

How is the performance of a TTS system evaluated?

A subjective measure of sound quality, the mean opinion score (MOS), is one of the most commonly used tests for assessing the performance of a TTS algorithm. Usually, several native speakers are asked to give a score of naturalness, from 1 (bad quality) to 5 (excellent quality), and the mean of those scores is the MOS. Audio samples recorded by professionals typically have an MOS of around 4.55, as shown in the WaveNet: A Generative Model for Raw Audio paper that will be presented later in this chapter (https://arxiv.org/abs/1609.03499).
This way of benchmarking TTS algorithms is not entirely satisfactory, however. For instance, it does not allow for a rigorous comparison of different algorithms presented in different papers. Indeed, algorithm A is not necessarily evaluated by the same sample of listeners as algorithm B. Since different individuals are likely to have different standards, more or less, regarding what a natural sound is, if A has an MOS score of 4.2 and B has an MOS score of 4.1, it does not necessarily mean that A is better than B (unless they are evaluated within the same study, by the same group of individuals). Besides, the sample size as well as the population from which the sample of listeners is selected are difficult to standardize, and might make a difference.

Traditional techniques – concatenative and parametric models

Before the rise of deep learning in TTS tasks, either concatenative or parametric models where used.
To create concatenative models, one needs to record high quality-audio content, split it into small chunks, and then recombine these chunks to form new speech. With parametric models, we have to create the features with signal processing techniques, which requires some extra domain knowledge.
Concatenative models tend to be intelligible, but lack naturalness. They require a huge dataset that takes into account as many human-generated audio units as possible. Therefore, they usually take a long time to develop.
In general, parametric models perform worse than concatenative models. They may lack intelligibility, and do not sound particularly na...

Table of contents

  1. Title Page
  2. Copyright and Credits
  3. Packt Upsell
  4. Foreword
  5. Contributors
  6. Preface
  7. Getting Started
  8. Text Classification and POS Tagging Using NLTK
  9. Deep Learning and TensorFlow
  10. Semantic Embedding Using Shallow Models
  11. Text Classification Using LSTM
  12. Searching and DeDuplicating Using CNNs
  13. Named Entity Recognition Using Character LSTM
  14. Text Generation and Summarization Using GRUs
  15. Question-Answering and Chatbots Using Memory Networks
  16. Machine Translation Using the Attention-Based Model
  17. Speech Recognition Using DeepSpeech
  18. Text-to-Speech Using Tacotron
  19. Deploying Trained Models
  20. Other Books You May Enjoy