eBook - ePub

Python Natural Language Processing

Name: Python Natural Language Processing
ISBN: 9781787285521

Jalaj Thanaki,

486 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Python Natural Language Processing

Jalaj Thanaki,

About this book

Leverage the power of machine learning and deep learning to extract information from text dataAbout This Book• Implement Machine Learning and Deep Learning techniques for efficient natural language processing• Get started with NLTK and implement NLP in your applications with ease• Understand and interpret human languages with the power of text analysis via PythonWho This Book Is ForThis book is intended for Python developers who wish to start with natural language processing and want to make their applications smarter by implementing NLP in them.What You Will Learn• Focus on Python programming paradigms, which are used to develop NLP applications• Understand corpus analysis and different types of data attribute.• Learn NLP using Python libraries such as NLTK, Polyglot, SpaCy, Standford CoreNLP and so on• Learn about Features Extraction and Feature selection as part of Features Engineering.• Explore the advantages of vectorization in Deep Learning.• Get a better understanding of the architecture of a rule-based system.• Optimize and fine-tune Supervised and Unsupervised Machine Learning algorithms for NLP problems.• Identify Deep Learning techniques for Natural Language Processing and Natural Language Generation problems.In DetailThis book starts off by laying the foundation for Natural Language Processing and why Python is one of the best options to build an NLP-based expert system with advantages such as Community support, availability of frameworks and so on. Later it gives you a better understanding of available free forms of corpus and different types of dataset. After this, you will know how to choose a dataset for natural language processing applications and find the right NLP techniques to process sentences in datasets and understand their structure. You will also learn how to tokenize different parts of sentences and ways to analyze them.During the course of the book, you will explore the semantic as well as syntactic analysis of text. You will understand how to solve various ambiguities in processing human language and will come across various scenarios while performing text analysis.You will learn the very basics of getting the environment ready for natural language processing, move on to the initial setup, and then quickly understand sentences and language parts. You will learn the power of Machine Learning and Deep Learning to extract information from text data.By the end of the book, you will have a clear understanding of natural language processing and will have worked on multiple examples that implement NLP in the real world.Style and approachThis book teaches the readers various aspects of natural language Processing using NLTK. It takes the reader from the basic to advance level in a smooth way.

Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead

Information

Publisher

Packt Publishing

Year

2017

eBook ISBN

9781787285521

Edition

Topic

Computer Science

Subtopic

Data Processing

Index

Computer Science

Machine Learning for NLP Problems

We have seen the basic and the advanced levels of feature engineering. We have also seen how rule-based systems can be used to develop NLP applications. In this chapter, we will develop NLP applications, and to develop the applications, we will use machine learning (ML) algorithms. We will begin with the basics of ML. After this, we will see the basic development steps of NLP applications that use ML. We will mostly see how to use ML algorithms in the NLP domain. Then, we will move towards the features selection section. We will also take a look at hybrid models and post-processing techniques.

This is the outline of this chapter given as follows:

Understanding the basics of machine learning
Development steps for NLP application
Understanding ML algorithms and other concepts
Hybrid approaches for NLP applications

Let's explore the world of ML!

Understanding the basics of machine learning

First of all, we will understand what machine learning is. Traditionally, programming is all about defining all the steps to reach a certain predefined outcome. During this process of programming, we define each of the minute steps using a programming language that help us achieve our outcome. To give you a basic understanding, I'll take a general example. Suppose that you want to write a program that will help you draw a face. You may first write the code that draws the left eye, then write the code that draws the right eye, then the nose, and so on. Here, you are writing the code for each facial attribute, but ML flips this approach. In ML, we define the outcome and the program learns the steps to achieve the defined output. So, instead of writing code for each facial attribute, we provide hundreds of samples of human faces to the machine. We expect the machine to learn the steps that are needed to draw a human face so that it can draw some new human faces. Apart from this, when we provide the new human face as well as some animal face, it should recognize which face looks like a human face.

Let's take some general examples. If you want to recognize the valid license plates of certain states, in traditional programming, you need to write code such as what the shape of the license plate should be, what the color should be, what the fonts are, and so on. These coding steps are too lengthy if you are trying to manually code each single property of the license plate. Using ML, we will provide some example license plates to the machine and the machine will learn the steps so that it can recognize the new valid license plate.

Let's assume that you want to make a program that can play the game Super Mario and win the game as well. So, defining each game rule is too difficult for us. We usually define a goal such as you need to get to the endpoint without dying and the machine learns all the steps to reach the endpoint.

Sometimes, problems are too complicated, and even we don't know what steps should possibly be taken to solve these problems. For example, we are a bank and we suspect that there are some fraudulent activities happening, but we are not sure how to detect them or we don't even know what to look for. We can provide a log of all the user activities and find the users who are not behaving like the rest of the users. The machine learns the steps to detect the anomalies by itself.

ML is everywhere on the internet. Every big tech company is using it in some way. When you see any YouTube video, YouTube updates or provides you with suggestions of other videos that you may like to watch. Even your phone uses ML to provide you with facilities such as iPhone's Siri, Google Assistance, and so on. The ML field is currently advancing very fast. Researchers use old concepts, change some of them, or use other researchers, work to make it more efficient and useful.

Let's look at the basic traditional definition of ML. In 1959, a researcher named Arthur Samuel gave computers the ability to learn without being explicitly programmed. He evolved this concept of ML from the study of pattern recognition and computational learning theory in AI. In 1997, Tom Mitchell gave us an accurate definition that has been useful to those who can understand basic math. The definition of ML as per Tom Mitchell is: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

Let's link the preceding definition with our previous example. To identify a license plate is called task T. You will run some ML programs using examples of license plates called experience E, and if it successfully learns, then it can predict the next unseen license plate that is called performance measure P. Now it's time to explore different types of ML and how it's related to AI.

Types of ML

In this section, we will look at different types of ML and some interesting sub-branch and super-branch relationships.

ML itself is derived from the branch called artificial intelligence. ML also has a branch that is creating lot of buzz nowadays called deep learning, but we will look at artificial intelligence and deep learning in detail in Chapter 9, Deep Learning for NLP and NLG Problems.

Learning techniques can be divided into different types. In this chapter, we are focusing on ML. Refer to Figure 8.1:

Figure 8.1: Subset and superset relationships of ML with other branches (Image credit: https://portfortune.files.wordpress.com/2016/10/ai-vs-ml.webp)

ML techniques can be divided into three different types, which you can see in Figure 8.2:

Figure 8.2: Three types of ML (Image credit: https://cdn-images-1.medium.com/max/1018/1*Yf8rcXiwvqEAinDTWTnCPA.webp)

We will look at each type of ML in detail. So, let's begin!

Supervised learning

In this type of ML, we will provide a labeled dataset as input to the ML algorithm and our ML algorithm knows what is correct and what is not correct. Here, the ML algorithm learns mapping between the labels and data. It generates the ML model and then the generated ML model can be used to solve some given task.

Suppose we have some text data that has labels such as spam emails and non-spam emails. Each text stream of the dataset has either of these two labels. When we apply the supervised ML algorithm, it uses the labeled data and generates an ML model that predicts the label as spam or non-spam for the unseen text stream. This is an example of supervised learning.

Unsupervised learning

In this type of ML, we will provide an unlabeled dataset as input to the ML algorithm. So, our algorithm doesn't get any feedback on what is correct or not. It has to learn by itself the structure of the data to solve a given task. It is harder to use an unlabeled dataset, but it's more convenient because not everyone has a perfectly labeled dataset. Most data is unlabeled, messy, and complex.

Suppose we are trying to develop a summarization application. We probably haven't summarized the documents corresponding to the actual document. Then, we will use raw and the actual text document to create a summary for the given ...

Title Page
Copyright
Credits
Foreword
About the Author
Acknowledgement
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface
Introduction
Practical Understanding of a Corpus and Dataset
Understanding the Structure of a Sentences
Preprocessing
Feature Engineering and NLP Algorithms
Advanced Feature Engineering and NLP Algorithms
Rule-Based System for NLP
Machine Learning for NLP Problems
Deep Learning for NLU and NLG Problems
Advanced Tools
How to Improve Your NLP Skills
Installation Guide

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Python Natural Language Processing by Jalaj Thanaki in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Processing. We have over one million books available in our catalogue for you to explore.

About this book

Tools to learn more effectively

Information

Table of contents

Frequently asked questions