eBook - ePub

Mastering Transformers

Name: Mastering Transformers
ISBN: 9781801078894

Sava? Y?ld?r?m,

Meysam Asgari- Chenaghlu,

Savas Yildirim,

Meysam Asgari-Chenaghlu,

374 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Mastering Transformers

Sava? Y?ld?r?m,

Meysam Asgari- Chenaghlu,

Savas Yildirim,

Meysam Asgari-Chenaghlu,

About this book

Take a problem-solving approach to learning all about transformers and get up and running in no time by implementing methodologies that will build the future of NLP

Key Features

Explore quick prototyping with up-to-date Python libraries to create effective solutions to industrial problems
Solve advanced NLP problems such as named-entity recognition, information extraction, language generation, and conversational AI
Monitor your model's performance with the help of BertViz, exBERT, and TensorBoard

Book Description

Transformer-based language models have dominated natural language processing (NLP) studies and have now become a new paradigm. With this book, you'll learn how to build various transformer-based NLP applications using the Python Transformers library.The book gives you an introduction to Transformers by showing you how to write your first hello-world program. You'll then learn how a tokenizer works and how to train your own tokenizer. As you advance, you'll explore the architecture of autoencoding models, such as BERT, and autoregressive models, such as GPT. You'll see how to train and fine-tune models for a variety of natural language understanding (NLU) and natural language generation (NLG) problems, including text classification, token classification, and text representation. This book also helps you to learn efficient models for challenging problems, such as long-context NLP tasks with limited computational capacity. You'll also work with multilingual and cross-lingual problems, optimize models by monitoring their performance, and discover how to deconstruct these models for interpretability and explainability. Finally, you'll be able to deploy your transformer models in a production environment.By the end of this NLP book, you'll have learned how to use Transformers to solve advanced NLP problems using advanced models.

What you will learn

Explore state-of-the-art NLP solutions with the Transformers library
Train a language model in any language with any transformer architecture
Fine-tune a pre-trained language model to perform several downstream tasks
Select the right framework for the training, evaluation, and production of an end-to-end solution
Get hands-on experience in using TensorBoard and Weights & Biases
Visualize the internal representation of transformer models for interpretability

Who this book is for

This book is for deep learning researchers, hands-on NLP practitioners, as well as ML/NLP educators and students who want to start their journey with Transformers. Beginner-level machine learning knowledge and a good command of Python will help you get the best out of this book.

]]>

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.

Yes, you can access Mastering Transformers by Sava? Y?ld?r?m,Meysam Asgari- Chenaghlu,Savas Yildirim,Meysam Asgari-Chenaghlu in PDF and/or ePUB format, as well as other popular books in Ciencia de la computación & Inteligencia artificial (IA) y semántica. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Packt Publishing

Year

2021

eBook ISBN

9781801078894

Topic

Ciencia de la computación

Subtopic

Inteligencia artificial (IA) y semántica

Section 1: Introduction – Recent Developments in the Field, Installations, and Hello World Applications

In this section, you will learn about all aspects of Transformers at an introductory level. You will write your first hello-world program with Transformers by loading community-provided pre-trained language models and running the related code with or without a GPU. Installing and utilizing the tensorflow, pytorch, conda, transformers, and sentenceTransformers libraries will also be explained in detail in this section.

This section comprises the following chapters:

Chapter 1, From Bag-of-Words to the Transformers
Chapter 2, A Hands-On Introduction to the Subject

Chapter 1: From Bag-of-Words to the Transformer

In this chapter, we will discuss what has changed in Natural Language Processing (NLP) over two decades. We experienced different paradigms and finally entered the era of Transformer architectures. All the paradigms help us to gain a better representation of words and documents for problem-solving. Distributional semantics describes the meaning of a word or a document with vectorial representation, looking at distributional evidence in a collection of articles. Vectors are used to solve many problems in both supervised and unsupervised pipelines. For language-generation problems, n-gram language models have been leveraged as a traditional approach for years. However, these traditional approaches have many weaknesses that we will discuss throughout the chapter.

We will further discuss classical Deep Learning (DL) architectures such as Recurrent Neural Networks (RNNs), Feed-Forward Neural Networks (FFNNs), and Convolutional Neural Networks (CNNs). These have improved the performance of the problems in the field and have overcome the limitation of traditional approaches. However, these models have had their own problems too. Recently, Transformer models have gained immense interest because of their effectiveness in all NLP tasks, from text classification to text generation. However, the main success has been that Transformers effectively improve the performance of multilingual and multi-task NLP problems, as well as monolingual and single tasks. These contributions have made Transfer Learning (TL) more possible in NLP, which aims to make models reusable for different tasks or different languages.

Starting with the attention mechanism, we will briefly discuss the Transformer architecture and the differences between previous NLP models. In parallel with theoretical discussions, we will show practical examples with the popular NLP framework. For the sake of simplicity, we will choose introductory code examples that are as short as possible.

In this chapter, we will cover the following topics:

Evolution of NLP toward Transformers
Understanding distributional semantics
Leveraging DL
Overview of the Transformer architecture
Using TL with Transformers

Technical requirements

We will be using Jupyter Notebook to run our coding exercises that require python >=3.6.0, along with the following packages that need to be installed with the pip install command:

sklearn
nltk==3.5.0
gensim==3.8.3
fasttext
keras>=2.3.0
Transformers >=4.00

All notebooks with coding exercises are available at the following GitHub link: https://github.com/PacktPublishing/Advanced-Natural-Language-Processing-with-Transformers/tree/main/CH01.

Check out the following link to see Code in Action Video: https://bit.ly/2UFPuVd

Evolution of NLP toward Transformers

We have seen profound changes in NLP over the last 20 years. During this period, we experienced different paradigms and finally entered a new era dominated mostly by magical Transformer architecture. This architecture did not come out of nowhere. Starting with the help of various neural-based NLP approaches, it gradually evolved to an attention-based encoder-decoder type architecture and still keeps evolving. The architecture and its variants have been successful thanks to the following developments in the last decade:

Contextual word embeddings
Better subword tokenization algorithms for handling unseen words or rare words
Injecting additional memory tokens into sentences, such as Paragraph ID in Doc2vec or a Classification (CLS) token in Bidirectional Encoder Representations from Transformers (BERT)
Attention mechanisms, which overcome the problem of forcing input sentences to encode all information into one context vector
Multi-head self-attention
Positional encoding to case word order
Parallelizable architectures that make for faster training and fine-tuning
Model compression (distillation, quantization, and so on)
TL (cross-lingual, multitask learning)

For many years, we used traditional NLP approaches such as n-gram language models, TF-IDF-based information retrieval models, and one-hot encoded document-term matrices. All these approaches have contributed a lot to the solution of many NLP problems such as sequence classification, language generation, language understanding, and so forth. On the other hand, these traditional NLP methods have their own weaknesses—for instance, falling short in solving the problems of sparsity, unseen words representation, tracking long-term dependencies, and others. In order to cope with these weaknesses, we developed DL-based approaches such as the following:

RNNs
CNNs
FFNNs
Several variants of RNNs, CNNs, and FFNNs

In 2013, as a two-layer FFNN word-encoder model, Word2vec, sorted out the dimensionality problem by producing short and dense representations of the words, called word embeddings. This early model managed to produce fast and efficient static word embeddings. It transformed unsupervised textual data into supervised data (self-supervised learning) by either predicting the target word using context or predicting neighbor words based on a sliding window. GloVe, another widely used and popular model, argued that count-based models can be better than neural models. It leverages both global and local statistics of a corpus to learn embeddings based on word-word co-occurrence statistics. It performed well on some syntactic and semantic tasks, as shown in the following screensho...

Mastering Transformers
Contributors
Preface
Section 1: Introduction – Recent Developments in the Field, Installations, and Hello World Applications
Chapter 1: From Bag-of-Words to the Transformer
Chapter 2: A Hands-On Introduction to the Subject
Section 2: Transformer Models – From Autoencoding to Autoregressive Models
Chapter 3: Autoencoding Language Models
Chapter 4:Autoregressive and Other Language Models
Chapter 5: Fine-Tuning Language Models for Text Classification
Chapter 6: Fine-Tuning Language Models for Token Classification
Chapter 7: Text Representation
Section 3: Advanced Topics
Chapter 8: Working with Efficient Transformers
Chapter 9:Cross-Lingual and Multilingual Language Modeling
Chapter 10: Serving Transformer Models
Chapter 11: Attention Visualization and Experiment Tracking
Other Books You May Enjoy