Learning Objectives
By the end of this chapter, you will be able to:
- Describe natural language processing and its applications
- Explain different text preprocessing techniques
- Perform text preprocessing on text corpora
- Explain the functioning of Word2Vec and GloVe word embeddings
- Generate word embeddings using Word2Vec and GloVe
- Use the NLTK, Gensim, and Glove-Python libraries for text preprocessing and generating word embeddings
This chapter aims to equip you with knowledge of the basics of natural language processing and experience with the various text preprocessing techniques used in Deep Learning.
Introduction
Welcome to deep learning for Natural Language Processing. This book guides you in understanding and optimizing deep learning techniques for the purpose of natural language processing, which furthers the reality of generalized artificial intelligence. You will journey through the concepts of natural language processing – its applications and implementations – and learn the ways of deep neural networks, along with utilizing them to enable machines to understand natural language.
The Basics of Natural Language Processing
To understand what natural language processing is, let's break the term into two:
- Natural language is a form of written and spoken communication that has developed organically and naturally.
- Processing means analyzing and making sense of input data with computers.
Figure 1.1: Natural language processing
Therefore, natural language processing is the machine-based processing of human communication. It aims to teach machines how to process and understand the language of humans, thereby allowing an easy channel of communication between human and machines.
For example, the personal voice assistants found in our phones and smart speakers, such as Alexa and Siri, are a result of natural language processing. They have been created in such a manner that they are able to not only understand what we say to them but also to act upon what we say and respond with feedback. Natural language processing algorithms aid these technologies in communicating with humans.
The key thing to consider in the mentioned definition of natural language processing is that the communication needs to occur in the natural language of humans. We've been communicating with machines for decades now by creating programs to perform certain tasks and executing them. However, these programs are written in languages that are not natural languages, because they are not forms of spoken communication and they haven't developed naturally or organically. These languages, such as Java, Python, C, and C++, were created with machines in mind and the consideration always being, "what will the machine be able to understand and process easily?"
While Python is a more user-friendly language and so is easier for humans to learn and be able to write code in, the basic point remains the same – to communicate with a machine, humans must learn a language that the machine is able to understand.
Figure 1.2: Venn diagram for natural language processing
The purpose of natural language processing is the opposite of this. Rather than having humans conform to the ways of a machine and learn how to effectively communicate with them, natural language processing enables machines to conform to humans and learn their way of communication. This makes more sense since the aim of technology is to make our lives easier.
To clarify this with an example, your first ever program was probably a piece of code that asked the machine to print 'hello world'. This was you conforming to the machine and asking it to execute a task in a language that it understood. Asking your voice assistant to say 'hello world' by voicing this command to it, and having it say 'hello world' back to you, is an example of the application of natural language processing, because you are communicating with a machine in your natural language (in this case, English). The machine is conforming to your form of communication, understanding what you're saying, processing what you're asking it to do, and then executing the task.
Importance of natural language processing
The following figure illustrates the various sections of the field of artificial intelligence:
Fig 1.3: Artificial intelligence and some of its subfields
Along with machine learning and deep learning, natural language processing is a subfield of artificial intelligence, and because it deals with natural language, it's actually at the intersection of artificial intelligence and linguistics.
As mentioned, natural language processing is what enables machines to understand the language of humans, thus allowing an efficient channel of communication between the two. However, there is another reason Natural language processing is necessary, and that is because, like machines, machine learning and deep learning models work best with numerical data. Numerical data is hard for humans to naturally produce; imagine us talking in numbers rather than words. So, natural language processing works with textual data and converts it into numerical data, enabling machine learning and deep learning models to be fitted on it. Thus, it exists to bridge the communication gap between humans and machines by taking the spoken and written forms of language from humans and converting them into data that can be understood by machines. Thanks to natural language processing, the machine is able to make sense of, answer questions based on, solve problems using, and communicate in a natural language, among other things.
Capabilities of Natural language processing
Natural language processing has many real-world applications that benefit the lives of humans. These applications fall under three broad capabilities of natural language processing:
- Speech Recognition
The machine is able to recognize a natural language in its spoken form and translate it into a textual form. An example of this is dictation on your smartphones – you can enable dictation and speak to your phone, and it will convert whatever you are saying into text.
- Natural Language Understanding
The machine is able to understand a natural language in both its spoken and written form. If given a command, the machine is able to understand and execute it. An example of this would be saying 'Hey Siri, call home' to Siri on your iPhone for Siri to automatically call 'home' for you.
- Natural Language Generation
The machine is able to generate natural language itself. An example of this is asking 'Siri, what time is it?' to Siri on your iPhone and Siri replying with the time – 'It's 2:08pm'.
These three capabilities are used to accomplish and automate a lot of tasks. Let's take a look at some of the things natural language processing contributes to, and how.
Note
Textual data is known as corpora (plural) and a corpus (singular).
Applications of Natural Language Processing
The following figure depicts the general application areas of natural language processing:
Figure 1.4: Application areas of natural language processing
- Automatic text summarization
This involves processing corpora to provide a summary.
- Translation
This entails translation tools that translate text to and from different languages, for example, Google Translate.
- Sentiment analysis
This is also known as emotional artificial intelligence or opinion mining, and it is the process of identifying, extracting, and quantifying emotions and affective states from corpora, both written and spoken. Sentiment analysis tools are used to process things such as customer reviews and social media posts to understand emotional res...