Transformers for Natural Language Processing
eBook - ePub

Transformers for Natural Language Processing

Build innovative deep neural network architectures for NLP with Python, PyTorch, TensorFlow, BERT, RoBERTa, and more

Denis Rothman

Compartir libro
  1. 384 páginas
  2. English
  3. ePUB (apto para móviles)
  4. Disponible en iOS y Android
eBook - ePub

Transformers for Natural Language Processing

Build innovative deep neural network architectures for NLP with Python, PyTorch, TensorFlow, BERT, RoBERTa, and more

Denis Rothman

Detalles del libro
Vista previa del libro
Índice
Citas

Información del libro

Publisher's Note: A new edition of this book is out now that includes working with GPT-3 and comparing the results with other models. It includes even more use cases, such as casual language analysis and computer vision tasks, as well as an introduction to OpenAI's Codex.

Key Features

  • Build and implement state-of-the-art language models, such as the original Transformer, BERT, T5, and GPT-2, using concepts that outperform classical deep learning models
  • Go through hands-on applications in Python using Google Colaboratory Notebooks with nothing to install on a local machine
  • Test transformer models on advanced use cases

Book Description

The transformer architecture has proved to be revolutionary in outperforming the classical RNN and CNN models in use today. With an apply-as-you-learn approach, Transformers for Natural Language Processing investigates in vast detail the deep learning for machine translations, speech-to-text, text-to-speech, language modeling, question answering, and many more NLP domains with transformers.

The book takes you through NLP with Python and examines various eminent models and datasets within the transformer architecture created by pioneers such as Google, Facebook, Microsoft, OpenAI, and Hugging Face.

The book trains you in three stages. The first stage introduces you to transformer architectures, starting with the original transformer, before moving on to RoBERTa, BERT, and DistilBERT models. You will discover training methods for smaller transformers that can outperform GPT-3 in some cases. In the second stage, you will apply transformers for Natural Language Understanding (NLU) and Natural Language Generation (NLG). Finally, the third stage will help you grasp advanced language understanding techniques such as optimizing social network datasets and fake news identification.

By the end of this NLP book, you will understand transformers from a cognitive science perspective and be proficient in applying pretrained transformer models by tech giants to various datasets.

What you will learn

  • Use the latest pretrained transformer models
  • Grasp the workings of the original Transformer, GPT-2, BERT, T5, and other transformer models
  • Create language understanding Python programs using concepts that outperform classical deep learning models
  • Use a variety of NLP platforms, including Hugging Face, Trax, and AllenNLP
  • Apply Python, TensorFlow, and Keras programs to sentiment analysis, text summarization, speech recognition, machine translations, and more
  • Measure the productivity of key transformers to define their scope, potential, and limits in production

Who this book is for

Since the book does not teach basic programming, you must be familiar with neural networks, Python, PyTorch, and TensorFlow in order to learn their implementation with Transformers.

Readers who can benefit the most from this book include experienced deep learning & NLP practitioners and data analysts & data scientists who want to process the increasing amounts of language-driven data.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?
Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.
¿Cómo descargo los libros?
Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.
¿En qué se diferencian los planes de precios?
Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.
¿Qué es Perlego?
Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.
¿Perlego ofrece la función de texto a voz?
Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.
¿Es Transformers for Natural Language Processing un PDF/ePUB en línea?
Sí, puedes acceder a Transformers for Natural Language Processing de Denis Rothman en formato PDF o ePUB, así como a otros libros populares de Computer Science y Natural Language Processing. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Año
2021
ISBN
9781800568631

1

Getting Started with the Model Architecture of the Transformer

Language is the essence of human communication. Civilizations would never have been born without the word sequences that form language. We now mostly live in a world of digital representations of language. Our daily lives rely on Natural Language Processing (NLP) digitalized language functions: web search engines, emails, social networks, posts, tweets, smartphone texting, translations, web pages, speech-to-text on streaming sites for transcripts, text-to-speech on hotline services, and many more everyday functions.
In December 2017, the seminal Vaswani et al. Attention Is All You Need article, written by Google Brain members and Google Research, was published. The Transformer was born. The Transformer outperformed the existing state-of-the-art NLP models. The Transformer trained faster than previous architectures and obtained higher evaluation results. Transformers have become a key component of NLP.
The digital world would never have existed without NLP. Natural Language Processing would have remained primitive and inefficient without artificial intelligence. However, the use of Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) comes at a tremendous cost in terms of calculations and machine power.
In this chapter, we will first start with the background of NLP that led to the rise of the Transformer. We will briefly go from early NLP to RNNs and CNNs. Then we will see how the Transformer overthrew the reign of RNNs and CNNs, which had prevailed for decades for sequence analysis.
Then we will open the hood of the Transformer model described by Vaswani et al. (2017) and examine the key components of its architecture. We will explore the fascinating world of attention and illustrate the key components of the Transformer.
This chapter covers the following topics:
  • The background of the Transformer
  • The architecture of the Transformer
  • The Transformer's self-attention model
  • The encoding and decoding stacks
  • Input and output embedding
  • Positional embedding
  • Self-attention
  • Multi-head attention
  • Masked multi-attention
  • Residual connections
  • Normalization
  • Feedforward network
  • Output probabilities
Our first step will be to explore the background of the Transformer.

The background of the Transformer

In this section, we will go through the background of NLP that led to the Transformer. The Transformer model invented by Google Research has toppled decades of Natural Language Processing research, development, and implementations.
Let us first see how that happened when NLP reached a critical limit that required a new approach.
Over the past 100+ years, many great minds have worked on sequence transduction and language modeling. Machines progressively learned how to predict probable sequences of words. It would take a whole book to cite all the giants that made this happen.
In this section, I will share my favorite researchers with you to lay the ground for the arrival of the Transformer.
In the early 20th century, Andrey Markov introduced the concept of random values and created a theory of stochastic processes. We know them in artificial intelligence (AI) as Markov Decision Processes (MDPs), Markov Chains, and Markov Processes. In 1902, Markov showed that we could predict the next element of a chain, a sequence, using only the last past element of that chain. In 1913, he applied this to a 20,000-letter dataset using past sequences to predict the future letters of a chain. Bear in mind that he had no computer but managed to prove his theory, which is still in use today in AI.
In 1948, Claude Shannon's The Mathematical Theory of Communication was published. He cites Andrey Markov's theory multiple times when building his probabilistic approach to sequence modeling. Claude Shannon laid the ground for a communication model based on a source encoder, a transmitter, and a received decoder or semantic decoder.
In 1950, Alan Turing published his seminal article: Computing Machinery and Intelligence. Alan Turing based this article on machine intelligence on the immensely successful Turing Machine that decrypted German messages. The expression artificial intelligence was first used by John McCarthy in 1956. However, Alan Turing was implementing artificial intelligence in the 1940s to decode encrypted encoded messages in German.
In 1954, the Georgetown-IBM experiment used computers to translate Russian sentences into English using a rule system. A rule system is a program that runs a list of rules that will analyze language structures. Rule systems still exist. However, creating rule lists for the billions of language combinations in our digital world is a challenge yet to be met. For the moment, it seems impossible. But who knows what will happen?
In 1982, John Hopfield introduced Recurrent Neural Networks (RNNs), known as Hopfield networks or "associative" neural networks. John Hopfield was inspired by W.A. Little, who wrote The Existence of Persistent States in the Brain in 1974. RNNs evolved, and LSTMs emerged as we know them. An RNN memorizes the persistent states of a sequence efficiently:
Figure 1.1: The RNN process
Each state Sn captures the information of Sn-1 When the end of the network is reached, a function F will perform an action: transduction, modeling, or any other type of sequence-based task.
In the 1980s, Yann Le Cun designed the multi-purpose Convolutional Neural Network (CNN). He applied CNNs to text sequences, and they have been widely used for sequence transducti...

Índice