Transfer Learning for Natural Language Processing
eBook - ePub

Transfer Learning for Natural Language Processing

Paul Azunre

Share book
  1. 272 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Transfer Learning for Natural Language Processing

Paul Azunre

Book details
Book preview
Table of contents
Citations

About This Book

Build custom NLP models in record time by adapting pre-trained machine learning models to solve specialized problems. Summary
In Transfer Learning for Natural Language Processing you will learn: Fine tuning pretrained models with new domain data
Picking the right model to reduce resource usage
Transfer learning for neural network architectures
Generating text with generative pretrained transformers
Cross-lingual transfer learning with BERT
Foundations for exploring NLP academic literature Training deep learning NLP models from scratch is costly, time-consuming, and requires massive amounts of data. In Transfer Learning for Natural Language Processing, DARPA researcher Paul Azunre reveals cutting-edge transfer learning techniques that apply customizable pretrained models to your own NLP architectures. You'll learn how to use transfer learning to deliver state-of-the-art results for language comprehension, even when working with limited label data. Best of all, you'll save on training time and computational costs. Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. About the technology
Build custom NLP models in record time, even with limited datasets! Transfer learning is a machine learning technique for adapting pretrained machine learning models to solve specialized problems. This powerful approach has revolutionized natural language processing, driving improvements in machine translation, business analytics, and natural language generation. About the book
Transfer Learning for Natural Language Processing teaches you to create powerful NLP solutions quickly by building on existing pretrained models. This instantly useful book provides crystal-clear explanations of the concepts you need to grok transfer learning along with hands-on examples so you can practice your new skills immediately. As you go, you'll apply state-of-the-art transfer learning methods to create a spam email classifier, a fact checker, and more real-world applications. What's inside Fine tuning pretrained models with new domain data
Picking the right model to reduce resource use
Transfer learning for neural network architectures
Generating text with pretrained transformers About the reader
For machine learning engineers and data scientists with some experience in NLP. About the author
Paul Azunre holds a PhD in Computer Science from MIT and has served as a Principal Investigator on several DARPA research programs. Table of Contents
PART 1 INTRODUCTION AND OVERVIEW
1 What is transfer learning?
2 Getting started with baselines: Data preprocessing
3 Getting started with baselines: Benchmarking and optimization
PART 2 SHALLOW TRANSFER LEARNING AND DEEP TRANSFER LEARNING WITH RECURRENT NEURAL NETWORKS (RNNS)
4 Shallow transfer learning for NLP
5 Preprocessing data for recurrent neural network deep transfer learning experiments
6 Deep transfer learning for NLP with recurrent neural networks
PART 3 DEEP TRANSFER LEARNING WITH TRANSFORMERS AND ADAPTATION STRATEGIES
7 Deep transfer learning for NLP with the transformer and GPT
8 Deep transfer learning for NLP with BERT and multilingual BERT
9 ULMFiT and knowledge distillation adaptation strategies
10 ALBERT, adapters, and multitask adaptation strategies
11 Conclusions

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Transfer Learning for Natural Language Processing an online PDF/ePUB?
Yes, you can access Transfer Learning for Natural Language Processing by Paul Azunre in PDF and/or ePUB format, as well as other popular books in Computer Science & Natural Language Processing. We have over one million books available in our catalogue for you to explore.

Information

Publisher
Manning
Year
2021
ISBN
9781638350996

Part 1 Introduction and overview

Chapters 1, 2, and 3 review key concepts in machine learning, present a historical overview of advances in machine learning enabling recent progress in transfer learning for NLP, and stress the importance of studying the subject. They also walk through a pair of relevant examples that serve to both review your knowledge of more traditional NLP methods and get your hands dirty with some key modern transfer learning for NLP approaches.

1 What is transfer learning?

This chapter covers
  • What exactly transfer learning is, both generally in artificial intelligence (AI) and in the context of natural language processing (NLP)
  • Typical NLP tasks and the related chronology of NLP transfer learning advances
  • An overview of transfer learning in computer vision
  • The reason for the recent popularity of NLP transfer learning techniques
Artificial intelligence (AI) has transformed modern society in a dramatic way. Machines now perform tasks that human used to do, and they do them faster, cheaper, and, in some cases, more effectively. Popular examples include computer vision applications, which teach computers how to understand images and videos, such as for the detection of criminals in closed-circuit television camera feeds. Other computer vision applications include the detection of diseases from images of patients’ organs and the defining of plant species from plant leaves. Another important branch of AI, natural language processing (NLP), deals particularly with the analysis and processing of human natural language data. Examples of NLP applications include speech-to-text transcription and translation between various languages.
The most recent incarnation of the technical revolution in AI robotics and automation—which some refer to as the Fourth Industrial Revolution1—was sparked by the intersection of algorithmic advances for training large neural networks, the availability of vast amounts of data via the internet, and the ready availability of massively parallel capabilities via graphical processing units (GPUs), which were initially developed for the personal gaming market. The recent rapid advances in the automation of tasks relying on human perception, specifically computer vision and NLP, required these strides in neural network theory and practice to happen. The growth of this area enabled the development of sophisticated representations of input data and desired output signals to handle these difficult problems.
At the same time, projections of what AI will be able to accomplish have significantly exceeded what has been achieved in practice. We are warned of an apocalyptic future that will erase most human jobs and replace us all, potentially even posing an existential threat to us. NLP is not excluded from this speculation, as it is today one of the most active research areas within AI. It is my hope that reading this book will contribute to helping you gain a better understanding of what is realistically possible to expect from AI, machine learning, and NLP in the near future. However, the main purpose of this book is to arm readers with a set of actionable skills related to a recent paradigm that has become important in NLP—transfer learning.
Transfer learning aims to leverage prior knowledge from different settings—be it a different task, language, or domain—to help solve a problem at hand. It is inspired by the way in which humans learn, because we typically do not learn things from scratch for any given problem but rather build on prior knowledge that may be related. For instance, learning to play a musical instrument is considered easier when one already knows how to play another instrument. Obviously, the more similar the instruments—an organ versus a piano, for example—the more useful prior knowledge is and the easier learning the new instrument will be. However, even if the instruments are vastly different—such as the drum versus the piano—some prior knowledge can still be useful, even if less so, such as the practice of adhering to a rhythm.
Large research laboratories, such as Lawrence Livermore National Laboratories or Sandia National Laboratories, and large internet companies, such as Google and Facebook, are able to learn large sophisticated models by training deep neural networks on billions of words and millions of images. For instance, Google's NLP model BERT (Bidirectional Encoder Representations from Transformers), which will be introduced in the next chapter, was pretrained on the English version of Wikipedia (2.5 billion words) and the BookCorpus (0.8 billion words).2 Similarly, deep convolutional neural networks (CNNs) have been trained on more than 14 million images of the ImageNet dataset, and the learned parameters have been widely outsourced by a number of organizations. The amounts of resources required to train such models from scratch are not typically available to the average practitioner of neural networks today, such as NLP engineers working at smaller businesses or students at smaller schools. Does this mean that the smaller players are locked out of being able to achieve state-of-the-art results on their problems? Most definitely not—thankfully, the concept of transfer learning promises to alleviate this concern if applied correctly.
Why is transfer learning important?
Transfer learning enables you to adapt or transfer the knowledge acquired from one set of tasks and/or domains to a different set of tasks and/or domains. What this means is that a model trained with massive resources—including data, computing power, time, and cost—which were once open sourced can be fine-tuned and reused in new settings by the wider engineering community at a fraction of the original resource requirements. This represents a big step forward for the democratization of NLP and, more widely, AI. This paradigm is illustrated in figure 1.1, using the act of learning how to play a musical instrument as an example. It can be observed from the figure that information sharing between the different tasks/domains can lead to a reduction in data required to achieve the same performance for the later, or downstream, task B.
01_01

Figure 1.1 An illustration of the advantages of the transfer learning paradigm—shown in the bottom panel—where information is shared between systems trained for different tasks/domains, versus the traditional paradigm—shown in the top panel—where training occurs in parallel between tasks/domains. In the transfer learning paradigm, reduction in data and computing requirements can be achieved via the information/knowledge sharing. For instance, we expect a person to learn to play the drums more easily if they know how to play the piano first.

1.1 Overview of representative NLP tasks

The goal of NLP is to enable computers to understand natural human language. You can think of it as a process of systematically encoding natural language text into numerical representations that accurately portray its meaning. Although various taxonomies of typical NLP tasks exist, the following nonexhaustive list provides a framework for thinking about the scope of the problem and framing appropriately the various examples that will be addressed by this book. Note that some of these tasks may (or may not, depending on the specific algorithm selected) be required by other, more difficult, tasks on the list:
  • Part-of-speech (POS) tagging—Tagging a word in text with its part of speech; potential tags include verb, adjective, and noun.
  • Named entity recognition (NER)—Detecting entities in unstructured text, such as PERSON, ORGANIZATION, and LOCATION. Note that POS tagging could be part of an NER pipeline.
  • Sentence/document classification—Tagging sentences or documents with predefined categories, such as sentiments {“positive,” “negative”}, various topics {“entertainment,” “science,” “history”}, or some other predefined set of categories.
  • Sentiment analysis—Assigning to a sentence or document the sentiment expressed in it, for example, {“positive,” “negative”}. Indeed, you can arguably view this as a special case of sentence/document classification.
  • Automatic summarization—Summarizing the content of a collection of sentences or documents, usually in a few sentences or keywords.
  • Machine translation—Translating sentences/documents from one language into another language or a collection of languages.
  • Question answering—Determining an appropriate answer to a question posed by a human; for example, Question: What is the capital of Ghana? Answer: Accra.
  • Chatterbot/chatbot—Carrying out a conversation with a human convincingly, potentially aiming to accomplish some goal, such as maximizing the length of the conversation or extracting some specific information from the human. Note that a chatbot can be formulated as a question-answering system.
  • Speech recognition—Converting the audio of human speech into its text representation. Although a lot of effort has been and continues to be spent making speech recognition systems more reliable, in this book it is assumed that a text representation of the language of interest is already available.
  • Language modeling—Determining the probability distribution of a sequence of words in human language, where knowing the most likely next word in a sequence is particularly important for language generation—predicting the next word or sentence.
  • Dependency parsing—Splitting a sentence into a dependency tree that represents its grammatical structure and the relationships between its words. Note that POS tagging can be important here.

1.2 Understanding NLP in the context of AI

Before proceeding with the rest of this book, it is important to understand the term natura...

Table of contents