Hands-On Natural Language Processing with PyTorch 1.x
Build smart, AI-driven linguistic applications using deep learning and NLP techniques
Thomas Dop
BIRMINGHAM—MUMBAI
Hands-On Natural Language Processing with PyTorch 1.x
Copyright © 2020 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Amey Varangaonkar
Acquisition Editor: Devika Battike
Senior Editor: David Sugarman
Content Development Editor: Joseph Sunil
Technical Editor: Manikandan Kurup
Copy Editor: Safis Editing
Project Coordinator: Aishwarya Mohan
Proofreader: Safis Editing
Indexer: Rekha Nair
Production Designer: Jyoti Chauhan
First published: July 2020
Production reference: 1080720
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78980-274-0
www.packt.com
For Mhairi and Dr. F.R. Allen
–Thomas Dop
Packt.com
Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Why subscribe?
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Fully searchable for easy access to vital information
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at
packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at
[email protected] for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Contributors
About the author
Thomas Dop is a data scientist at MagicLab, a company that creates leading dating apps, including Bumble and Badoo. He works on a variety of areas within data science, including NLP, deep learning, computer vision, and predictive modeling. He holds an MSc in data science from the University of Amsterdam.
About the reviewers
Nilan Saha is pursuing a Master’s degree in Data Science with a specialization in Computational Linguistics from the University of British Columbia, Canada. He has worked as an NLP contractor for multiple startups in the past, and has also got brief experience in research, which has resulted in a few publications. He is also a Kaggle Kernels and Discussion Expert.
Chintan Gajjar is a senior consultant in KNOWARTH Technologies. He has also contributed to books such as Hadoop Backup and Recovery Solutions, MySQL 8 for Big Data, MySQL 8 Administrator’s Guide, and Hands-on Natural Language Processing with Python. He has a Master’s degree in computer applications from Ganpat University, India.
I would like to thank the author, co-reviewer, and the wonderful team at Packt Publishing for all efforts and my office colleagues, Darshan Kansara and Kathan Thakkar, for supporting me throughout the reviewing of this book. They both are technology enthusiasts and have a great understanding of AI/ML, CI-CD, and are great mentors.
Packt is searching for authors like you
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Preface
In the internet age, where an increasing volume of text data is being generated daily from social media and other platforms, being able to make sense of that data is a crucial skill. This book will help you build deep learning models for Natural Language Processing (NLP) tasks that will help you extract valuable insights from text.
We will start by understanding how to install PyTorch and using CUDA to accelerate the processing speed. You'll then explore how the NLP architecture works through practical examples. Later chapters will guide you through important principles, such as word embeddings, CBOW, and tokenization in PyTorch. You'll then learn some techniques for processing textual data and how deep learning can be used for NLP tasks. Next, we will demonstrate how to implement deep learning and neural network architectures to build models that allow you to classify and translate text and perform sentiment analysis. Finally, you will learn how to build advanced NLP models, such as conversational chatbots.
By the end of this book, you'll understand how different NLP problems can be solved using deep learning with PyTorch, as well as how to build models to solve them.
Who this book is for
This PyTorch book is for NLP developers, machine learning and deep learning developers, or anyone working toward building intelligent language applications using both traditional NLP approaches and deep learning architectures. If you're looking to adopt modern NLP techniques and models for your development projects, then this book is for you. Working knowledge of Python programming and basic working knowledge of NLP tasks are a must.
What this book covers
Chapter 1, Fundamentals of Machine Learning and Deep Learning, provides an overview of the fundamental aspects of machine learning and neural networks.
Chapter 2, Getting Started with PyTorch 1.x for NLP, shows you how to download, install, and start PyTorch. We will also run through some of the basic functionality of the package.
Chapter 3, NLP and Text Embeddings, shows you how to create text embeddings for NLP and use them in basic language models.
Chapter 4, Text Preprocessing, Stemming, and Lemmatization, shows you how to preprocess textual data for use in NLP deep learning models.
Chapter 5, Recurrent Neural Networks and Sentiment Analysis, runs through the fundamentals of recurrent neural networks and shows you how to use them to build a sentiment analysis model from scratch.
Chapter 6, Convolutional Neural Networks for Text Classification, runs through the fundamentals of convolutional neural networks and shows you how you can use them to build a working model for classifying text.
Chapter 7, Text Translation Using Sequence-to-Sequence Neural Networks, introduces the concept of sequence-to-sequence models for deep...