Transformers for Natural Language Processing
eBook - ePub

Transformers for Natural Language Processing

Build innovative deep neural network architectures for NLP with Python, PyTorch, TensorFlow, BERT, RoBERTa, and more

Denis Rothman

Share book
  1. 384 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Transformers for Natural Language Processing

Build innovative deep neural network architectures for NLP with Python, PyTorch, TensorFlow, BERT, RoBERTa, and more

Denis Rothman

Book details
Book preview
Table of contents
Citations

About This Book

Publisher's Note: A new edition of this book is out now that includes working with GPT-3 and comparing the results with other models. It includes even more use cases, such as casual language analysis and computer vision tasks, as well as an introduction to OpenAI's Codex.

Key Features

  • Build and implement state-of-the-art language models, such as the original Transformer, BERT, T5, and GPT-2, using concepts that outperform classical deep learning models
  • Go through hands-on applications in Python using Google Colaboratory Notebooks with nothing to install on a local machine
  • Test transformer models on advanced use cases

Book Description

The transformer architecture has proved to be revolutionary in outperforming the classical RNN and CNN models in use today. With an apply-as-you-learn approach, Transformers for Natural Language Processing investigates in vast detail the deep learning for machine translations, speech-to-text, text-to-speech, language modeling, question answering, and many more NLP domains with transformers.

The book takes you through NLP with Python and examines various eminent models and datasets within the transformer architecture created by pioneers such as Google, Facebook, Microsoft, OpenAI, and Hugging Face.

The book trains you in three stages. The first stage introduces you to transformer architectures, starting with the original transformer, before moving on to RoBERTa, BERT, and DistilBERT models. You will discover training methods for smaller transformers that can outperform GPT-3 in some cases. In the second stage, you will apply transformers for Natural Language Understanding (NLU) and Natural Language Generation (NLG). Finally, the third stage will help you grasp advanced language understanding techniques such as optimizing social network datasets and fake news identification.

By the end of this NLP book, you will understand transformers from a cognitive science perspective and be proficient in applying pretrained transformer models by tech giants to various datasets.

What you will learn

  • Use the latest pretrained transformer models
  • Grasp the workings of the original Transformer, GPT-2, BERT, T5, and other transformer models
  • Create language understanding Python programs using concepts that outperform classical deep learning models
  • Use a variety of NLP platforms, including Hugging Face, Trax, and AllenNLP
  • Apply Python, TensorFlow, and Keras programs to sentiment analysis, text summarization, speech recognition, machine translations, and more
  • Measure the productivity of key transformers to define their scope, potential, and limits in production

Who this book is for

Since the book does not teach basic programming, you must be familiar with neural networks, Python, PyTorch, and TensorFlow in order to learn their implementation with Transformers.

Readers who can benefit the most from this book include experienced deep learning & NLP practitioners and data analysts & data scientists who want to process the increasing amounts of language-driven data.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Transformers for Natural Language Processing an online PDF/ePUB?
Yes, you can access Transformers for Natural Language Processing by Denis Rothman in PDF and/or ePUB format, as well as other popular books in Informatik & Natürliche Sprachverarbeitung. We have over one million books available in our catalogue for you to explore.

Information

Year
2021
ISBN
9781800568631

1

Getting Started with the Model Architecture of the Transformer

Language is the essence of human communication. Civilizations would never have been born without the word sequences that form language. We now mostly live in a world of digital representations of language. Our daily lives rely on Natural Language Processing (NLP) digitalized language functions: web search engines, emails, social networks, posts, tweets, smartphone texting, translations, web pages, speech-to-text on streaming sites for transcripts, text-to-speech on hotline services, and many more everyday functions.
In December 2017, the seminal Vaswani et al. Attention Is All You Need article, written by Google Brain members and Google Research, was published. The Transformer was born. The Transformer outperformed the existing state-of-the-art NLP models. The Transformer trained faster than previous architectures and obtained higher evaluation results. Transformers have become a key component of NLP.
The digital world would never have existed without NLP. Natural Language Processing would have remained primitive and inefficient without artificial intelligence. However, the use of Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) comes at a tremendous cost in terms of calculations and machine power.
In this chapter, we will first start with the background of NLP that led to the rise of the Transformer. We will briefly go from early NLP to RNNs and CNNs. Then we will see how the Transformer overthrew the reign of RNNs and CNNs, which had prevailed for decades for sequence analysis.
Then we will open the hood of the Transformer model described by Vaswani et al. (2017) and examine the key components of its architecture. We will explore the fascinating world of attention and illustrate the key components of the Transformer.
This chapter covers the following topics:
  • The background of the Transformer
  • The architecture of the Transformer
  • The Transformer's self-attention model
  • The encoding and decoding stacks
  • Input and output embedding
  • Positional embedding
  • Self-attention
  • Multi-head attention
  • Masked multi-attention
  • Residual connections
  • Normalization
  • Feedforward network
  • Output probabilities
Our first step will be to explore the background of the Transformer.

The background of the Transformer

In this section, we will go through the background of NLP that led to the Transformer. The Transformer model invented by Google Research has toppled decades of Natural Language Processing research, development, and implementations.
Let us first see how that happened when NLP reached a critical limit that required a new approach.
Over the past 100+ years, many great minds have worked on sequence transduction and language modeling. Machines progressively learned how to predict probable sequences of words. It would take a whole book to cite all the giants that made this happen.
In this section, I will share my favorite researchers with you to lay the ground for the arrival of the Transformer.
In the early 20th century, Andrey Markov introduced the concept of random values and created a theory of stochastic processes. We know them in artificial intelligence (AI) as Markov Decision Processes (MDPs), Markov Chains, and Markov Processes. In 1902, Markov showed that we could predict the next element of a chain, a sequence, using only the last past element of that chain. In 1913, he applied this to a 20,000-letter dataset using past sequences to predict the future letters of a chain. Bear in mind that he had no computer but managed to prove his theory, which is still in use today in AI.
In 1948, Claude Shannon's The Mathematical Theory of Communication was published. He cites Andrey Markov's theory multiple times when building his probabilistic approach to sequence modeling. Claude Shannon laid the ground for a communication model based on a source encoder, a transmitter, and a received decoder or semantic decoder.
In 1950, Alan Turing published his seminal article: Computing Machinery and Intelligence. Alan Turing based this article on machine intelligence on the immensely successful Turing Machine that decrypted German messages. The expression artificial intelligence was first used by John McCarthy in 1956. However, Alan Turing was implementing artificial intelligence in the 1940s to decode encrypted encoded messages in German.
In 1954, the Georgetown-IBM experiment used computers to translate Russian sentences into English using a rule system. A rule system is a program that runs a list of rules that will analyze language structures. Rule systems still exist. However, creating rule lists for the billions of language combinations in our digital world is a challenge yet to be met. For the moment, it seems impossible. But who knows what will happen?
In 1982, John Hopfield introduced Recurrent Neural Networks (RNNs), known as Hopfield networks or "associative" neural networks. John Hopfield was inspired by W.A. Little, who wrote The Existence of Persistent States in the Brain in 1974. RNNs evolved, and LSTMs emerged as we know them. An RNN memorizes the persistent states of a sequence efficiently:
Figure 1.1: The RNN process
Each state Sn captures the information of Sn-1 When the end of the network is reached, a function F will perform an action: transduction, modeling, or any other type of sequence-based task.
In the 1980s, Yann Le Cun designed the multi-purpose Convolutional Neural Network (CNN). He applied CNNs to text sequences, and they have been widely used for sequence transducti...

Table of contents