Hands-On Natural Language Processing with PyTorch 1.x
eBook - ePub

Hands-On Natural Language Processing with PyTorch 1.x

Build smart, AI-driven linguistic applications using deep learning and NLP techniques

Thomas Dop

Share book
  1. 276 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Hands-On Natural Language Processing with PyTorch 1.x

Build smart, AI-driven linguistic applications using deep learning and NLP techniques

Thomas Dop

Book details
Book preview
Table of contents
Citations

About This Book

Become a proficient NLP data scientist by developing deep learning models for NLP and extract valuable insights from structured and unstructured data

Key Features

  • Get to grips with word embeddings, semantics, labeling, and high-level word representations using practical examples
  • Learn modern approaches to NLP and explore state-of-the-art NLP models using PyTorch
  • Improve your NLP applications with innovative neural networks such as RNNs, LSTMs, and CNNs

Book Description

In the internet age, where an increasing volume of text data is generated daily from social media and other platforms, being able to make sense of that data is a crucial skill. With this book, you'll learn how to extract valuable insights from text by building deep learning models for natural language processing (NLP) tasks.

Starting by understanding how to install PyTorch and using CUDA to accelerate the processing speed, you'll explore how the NLP architecture works with the help of practical examples. This PyTorch NLP book will guide you through core concepts such as word embeddings, CBOW, and tokenization in PyTorch. You'll then learn techniques for processing textual data and see how deep learning can be used for NLP tasks. The book demonstrates how to implement deep learning and neural network architectures to build models that will allow you to classify and translate text and perform sentiment analysis. Finally, you'll learn how to build advanced NLP models, such as conversational chatbots.

By the end of this book, you'll not only have understood the different NLP problems that can be solved using deep learning with PyTorch, but also be able to build models to solve them.

What you will learn

  • Use NLP techniques for understanding, processing, and generating text
  • Understand PyTorch, its applications and how it can be used to build deep linguistic models
  • Explore the wide variety of deep learning architectures for NLP
  • Develop the skills you need to process and represent both structured and unstructured NLP data
  • Become well-versed with state-of-the-art technologies and exciting new developments in the NLP domain
  • Create chatbots using attention-based neural networks

Who this book is for

This PyTorch book is for NLP developers, machine learning and deep learning developers, and anyone interested in building intelligent language applications using both traditional NLP approaches and deep learning architectures. If you're looking to adopt modern NLP techniques and models for your development projects, this book is for you. Working knowledge of Python programming, along with basic working knowledge of NLP tasks, is required.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Hands-On Natural Language Processing with PyTorch 1.x an online PDF/ePUB?
Yes, you can access Hands-On Natural Language Processing with PyTorch 1.x by Thomas Dop in PDF and/or ePUB format, as well as other popular books in Informatique & Traitement des données. We have over one million books available in our catalogue for you to explore.

Information

Year
2020
ISBN
9781789805536

Hands-On Natural Language Processing with PyTorch 1.x

Hands-On Natural Language Processing with PyTorch 1.x

Copyright © 2020 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Amey Varangaonkar
Acquisition Editor: Devika Battike
Senior Editor: David Sugarman
Content Development Editor: Joseph Sunil
Technical Editor: Manikandan Kurup
Copy Editor: Safis Editing
Project Coordinator: Aishwarya Mohan
Proofreader: Safis Editing
Indexer: Rekha Nair
Production Designer: Jyoti Chauhan
First published: July 2020
Production reference: 1080720
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78980-274-0
www.packt.com
For Mhairi and Dr. F.R. Allen
–Thomas Dop
Packt.com
Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Why subscribe?
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Fully searchable for easy access to vital information
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the author

Thomas Dop is a data scientist at MagicLab, a company that creates leading dating apps, including Bumble and Badoo. He works on a variety of areas within data science, including NLP, deep learning, computer vision, and predictive modeling. He holds an MSc in data science from the University of Amsterdam.

About the reviewers

Nilan Saha is pursuing a Master’s degree in Data Science with a specialization in Computational Linguistics from the University of British Columbia, Canada. He has worked as an NLP contractor for multiple startups in the past, and has also got brief experience in research, which has resulted in a few publications. He is also a Kaggle Kernels and Discussion Expert.
Chintan Gajjar is a senior consultant in KNOWARTH Technologies. He has also contributed to books such as Hadoop Backup and Recovery Solutions, MySQL 8 for Big Data, MySQL 8 Administrator’s Guide, and Hands-on Natural Language Processing with Python. He has a Master’s degree in computer applications from Ganpat University, India.
I would like to thank the author, co-reviewer, and the wonderful team at Packt Publishing for all efforts and my office colleagues, Darshan Kansara and Kathan Thakkar, for supporting me throughout the reviewing of this book. They both are technology enthusiasts and have a great understanding of AI/ML, CI-CD, and are great mentors.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents

Preface

Section 1: Essentials of PyTorch 1.x for NLP

Chapter 1: Fundamentals of Machine Learning and Deep Learning

Overview of machine learning14

Supervised learning14

Unsupervised learning17

How do models learn?18

Neural networks22

Structure of neural networks22

Activation functions23

How do neural networks learn?24

Overfitting in neural networks25

NLP for machine learning26

Bag-of-words26

Sequential representation27

Summary27

Chapter 2: Getting Started with PyTorch 1.x for NLP

Technical requirements30

Installing and using PyTorch 1.x30

Tensors32

Enabling PyTorch acceleration using CUDA32

Comparing PyTorch to other deep learning frameworks35

Building a simple neural network in PyTorch36

Loading the data 37

Building the classifier37

Implementing dropout38

Defining the forward pass39

Setting the model parameters39

Training our network41

Making predictions42

Evaluating our model44

NLP for PyTorch44

Setting up the classifier45

Training the classifier47

Summary51

Section 2: Fundamentals of Natural Language Processing

In this section, you will learn about the fundamentals of building a Natural Language Processing (NLP) application. You will also learn how to use various NLP techniques, such as word embeddings, CBOW, and tokenization in PyTorch in this section.53

Chapter 3: NLP and Text Embeddings

Technical requirements56

Embeddings for NLP56

GLoVe57

Embedding operations 59

Exploring CBOW61

CBOW architecture62

Building CBOW63

Exploring n-grams69

N-gram language modeling 70

Tokenization72

Tagging and chunking for parts of speech74

Tagging75

Chunking76

TF-IDF77

Calculating TF-IDF78

Implementing TF-IDF79

Calculating TF-IDF weighted embeddings81

Summary83

Chapter 4: Text Preprocessing, Stemming, and Lemmatization

Technical requirements86

Text preprocessing86

Removing HTML87

Converting text into lowercase87

Removing punctuation88

Replacing numbers90

Stemming and lemmatization91

Stemming92

Lemmatization94

Uses of stemming and lemmatization97

Differences in lemmatization and stemming98

Summary98

Section 3: Real-World NLP Applications Using PyTorch 1.x

Chapter 5: Recurrent Neural Networks and Sentiment Analysis

Technical requirements102

Building RNNs102

Using RNNs for sentiment analysis104

Exploding and shrinking gradients105

Introducing LSTMs106

Working with LSTMs107

LSTM cells108

Bidirectional LSTMs111

Building a sentiment analyzer using LSTMs112

Preprocessing the data113

Model architecture116

Training the model120

Using our model to make predictions125

Deploying the application on Heroku127

Introducing Heroku127

Creating an API using Flask – file structure127

Creating an API using Flask – API file129

Creating an API using Flask – hosting on Heroku131

Summary132

Chapter 6: Convolutional Neural Networks for Text Classification

Technical requirements134

Exploring CNNs134

Convolutions135

Convolutions for NLP137

Building a CNN for text classification140

Defining a multi-class classification dataset140

Creating iterators to load the data141

Constructing the CNN model145

Training the CNN150

Making predictions using the trained CNN154

Summary156

Chapter 7: Text Translation Using Sequence-to-Sequence Neural Networks

Technical requirements158

Theory of sequence-to-sequence models158

Encoders161

Decoders162

Using teacher forcing163

Building a sequence-to-sequence model for text translation164

Preparing the data165

Building the encoder169

Building the decoder171

Constructing the full sequence-to-sequence model172

Training the model175

Evaluating the model180

Next steps181

Summary182

Chapter 8: Building a Chatbot Using Attention-Based Neural Networks

Technical requirements184

The theory of attention within neural networks184

Comparing local and global attention185

Building a chatbot using sequence-to-sequence neural networks with attention188

Acquiring our dataset188

Processing our dataset189

Creating the vocabulary190

Loading the data193

Removing rare words195

Transforming sentence pairs to tensors197

Constructing the model201

Defining the training process206

Defining the evaluating process212

Training the model216

Summary222

Chapter 9: The Road Ahead

Exploring state-of-the-art NLP machine learning224

BERT224

BERT–Architecture227

Applications of BERT235

GPT-2236

Comparing self-attention and masked self-attention238

GPT-2 – Ethics238

Future NLP tasks240

Constituency parsing240

Semantic role labeling244

Textual entailment248

Machine comprehension251

Summary257

Other Books You May Enjoy

Leave a review - let other readers know what you think261

Preface

In the internet age, where an increasing volume of text data is being generated daily from social media and other platforms, being able to make sense of that data is a crucial skill. This book will help you build deep learning models for Natural Language Processing (NLP) tasks that will help you extract valuable insights from text.
We will start by understanding how to install PyTorch and using CUDA to accelerate the processing speed. You'll then explore how the NLP architecture works through practical examples. Later chapters will guide you through important principles, such as word embeddings, CBOW, and tokenization in PyTorch. You'll then learn some techniques for processing textual data and how deep learning can be used for NLP tasks. Next, we will demonstrate how to implement deep learning and neural network architectures to build models that allow you to classify and translate text and perform sentiment analysis. Finally, you will learn how to build advanced NLP models, such as conversational chatbots.
By the end of this book, you'll understand how different NLP problems can be solved using deep learning with PyTorch, as well as how to build models to solve them.

Who this book is for

This PyTorch book is for NLP developers, machine learning and deep learning developers, or anyone working toward building intelligent language applications using both traditional NLP approaches and deep learning architectures. If you're looking to adopt modern NLP techniques and models for your development projects, then this book is for you. Working knowledge of Python programming and basic working knowledge of NLP tasks are a must.

What this book covers

Chapter 1, Fundamentals of Machine Learning and Deep Learning, provides an overview of the fundamental aspects of machine learning and neural networks.
Chapter 2, Getting Started with PyTorch 1.x for NLP, shows you how to download, install, and start PyTorch. We will also run through some of the basic functionality of the package.
Chapter 3, NLP and Text Embeddings, shows you how to create text embeddings for NLP and use them in basic language models.
Chapter 4, Text Preprocessing, Stemming, and Lemmatization, shows you how to preprocess textual data for use in NLP deep learning models.
Chapter 5, Recurrent Neural Networks and Sentiment Analysis, runs through the fundamentals of recurrent neural networks and shows you how to use them to build a sentiment analysis model from scratch.
Chapter 6, Convolutional Neural Networks for Text Classification, runs through the fundamentals of convolutional neural networks and shows you how you can use them to build a working model for classifying text.
Chapter 7, Text Translation Using Sequence-to-Sequence Neural Networks, introduces the concept of sequence-to-sequence models for deep...

Table of contents