Natural Language Processing with Flair
eBook - ePub

Natural Language Processing with Flair

Tadej Magajna

Buch teilen
  1. 200 Seiten
  2. English
  3. ePUB (handyfreundlich)
  4. Über iOS und Android verfügbar
eBook - ePub

Natural Language Processing with Flair

Tadej Magajna

Angaben zum Buch
Buchvorschau
Inhaltsverzeichnis
Quellenangaben

Über dieses Buch

Learn how to solve practical NLP problems with the Flair Python framework, train sequence labeling models, work with text classifiers and word embeddings, and much more through hands-on practical exercisesKey Features• Backed by the community and written by an NLP expert• Get an understanding of basic NLP problems and terminology• Solve real-world NLP problems with Flair with the help of practical hands-on exercisesBook DescriptionFlair is an easy-to-understand natural language processing (NLP) framework designed to facilitate training and distribution of state-of-the-art NLP models for named entity recognition, part-of-speech tagging, and text classification. Flair is also a text embedding library for combining different types of embeddings, such as document embeddings, Transformer embeddings, and the proposed Flair embeddings.Natural Language Processing with Flair takes a hands-on approach to explaining and solving real-world NLP problems. You'll begin by installing Flair and learning about the basic NLP concepts and terminology. You will explore Flair's extensive features, such as sequence tagging, text classification, and word embeddings, through practical exercises. As you advance, you will train your own sequence labeling and text classification models and learn how to use hyperparameter tuning in order to choose the right training parameters. You will learn about the idea behind one-shot and few-shot learning through a novel text classification technique TARS. Finally, you will solve several real-world NLP problems through hands-on exercises, as well as learn how to deploy Flair models to production.By the end of this Flair book, you'll have developed a thorough understanding of typical NLP problems and you'll be able to solve them with Flair.What you will learn• Gain an understanding of core NLP terminology and concepts• Get to grips with the capabilities of the Flair NLP framework• Find out how to use Flair's state-of-the-art pre-built models• Build custom sequence labeling models, embeddings, and classifiers• Learn about a novel text classification technique called TARS• Discover how to build applications with Flair and how to deploy them to productionWho this book is forThis Flair NLP book is for anyone who wants to learn about NLP through one of the most beginner-friendly, yet powerful Python NLP libraries out there. Software engineering students, developers, data scientists, and anyone who is transitioning into NLP and is interested in learning about practical approaches to solving problems with Flair will find this book useful. The book, however, is not recommended for readers aiming to get an in-depth theoretical understanding of the mathematics behind NLP. Beginner-level knowledge of Python programming is required to get the most out of this book.

Häufig gestellte Fragen

Wie kann ich mein Abo kündigen?
Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kündigen“ – ganz einfach. Nachdem du gekündigt hast, bleibt deine Mitgliedschaft für den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.
(Wie) Kann ich Bücher herunterladen?
Derzeit stehen all unsere auf Mobilgeräte reagierenden ePub-Bücher zum Download über die App zur Verfügung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die übrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.
Welcher Unterschied besteht bei den Preisen zwischen den Aboplänen?
Mit beiden Aboplänen erhältst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.
Was ist Perlego?
Wir sind ein Online-Abodienst für Lehrbücher, bei dem du für weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhältst. Mit über 1 Million Büchern zu über 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.
Unterstützt Perlego Text-zu-Sprache?
Achte auf das Symbol zum Vorlesen in deinem nächsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.
Ist Natural Language Processing with Flair als Online-PDF/ePub verfügbar?
Ja, du hast Zugang zu Natural Language Processing with Flair von Tadej Magajna im PDF- und/oder ePub-Format sowie zu anderen beliebten Büchern aus Informatica & Elaborazione di dati. Aus unserem Katalog stehen dir über 1 Million Bücher zur Verfügung.

Information

Jahr
2022
ISBN
9781801072236

Part 1: Understanding and Solving NLP with Flair

In this part, you will learn the basics of NLP and get an overview of the Flair framework. You will set up your environment, install Flair, and explore its basic features. You will learn how to extract knowledge from embeddings and use pre-trained sequence labeling models in Flair. 
This part comprises the following chapters:
  • Chapter 1, Introduction to Flair
  • Chapter 2, Flair Base Types
  • Chapter 3, Embeddings in Flair
  • Chapter 4, Sequence Tagging

Chapter 1: Introduction to Flair

There are few Natural Language Processing (NLP) frameworks out there as easy to learn and as easy to work with as Flair. Packed with pre-trained models, excellent documentation, and readable syntax, it provides a gentle learning curve for NLP researchers who are not necessarily skilled in coding; software engineers with poor theoretical foundations; students and graduates; as well as individuals with no prior knowledge simply interested in the topic. But before diving straight into coding, some background about the motivation behind Flair, the basic NLP concepts, and the different approaches to how you can set up your local environment may help you on your journey toward becoming a Flair NLP expert.
In Flair's official GitHub README, the framework is described as:
"A very simple framework for state-of-the-art Natural Language Processing"
This description will raise a few eyebrows. NLP researchers will immediately be interested in knowing what specific tasks the framework achieves its state-of-the-art results in. Engineers will be intrigued by the very simple label, but will wonder what steps are required to get up and running and what environments it can be used in. And those who are not knowledgeable in NLP will wonder whether they will be able to grasp the knowledge required to understand the problems Flair is trying to solve.
In this chapter, we will be answering all of these questions by covering the basic NLP concepts and terminology, providing an overview of Flair, and setting up our development environment with the help of the following sections:
  • A brief introduction to NLP
  • What is Flair?
  • Getting ready

Technical requirements

To get started, you will need a development environment with Python 3.6+. Platform-specific instructions for installing Python can be found at https://docs.python-guide.org/starting/installation/.
You will not require a GPU-equipped development machine, though having one will significantly speed up some of the training-related exercises described later in the book.
You will require access to a command line. On Linux and macOS, simply start the Terminal application. On Windows, press Windows + R to open the Run box, type cmd and then click OK.
Flair's official GitHub repository is available via the following link: https://github.com/flairNLP/flair. In this chapter we will install Flair version 0.11.
The code examples covered in this chapter are found in this book's official GitHub repository in the following Jupyter notebook: https://github.com/PacktPublishing/Natural-Language-Processing-with-Flair/tree/main/Chapter01.

A brief introduction to NLP

Before diving straight into what Flair is capable of and how to leverage its features, we will be going through a brief introduction to NLP to provide some context for readers who are not familiar with all the NLP techniques and tasks solved by Flair. NLP is a branch of artificial intelligence, linguistics, and software engineering that helps machines understand human language. When we humans read a sentence, our brains immediately make sense of many seemingly trivial problems such as the following:
  • Is the sentence written in a language I understand?
  • How can the sentence be split into words?
  • What is the relationship between the words?
  • What are the meanings of the individual words?
  • Is this a question or an answer?
  • Which part-of-speech categories are the words assigned to?
  • What is the abstract meaning of the sentence?
The human brain is excellent at solving these problems conjointly and often seamlessly, leaving us unaware that we made sense of all of these things simply by reading a sentence.
Even now, machines are still not as good as humans at solving all these problems at once. Therefore, to teach machines to understand human language, we have to split understanding of natural language into a set of smaller, machine-intelligible tasks that allow us to get answers to these questions one by one.
In this section, you will find a list of some important NLP tasks with emphasis on the tasks supported by Flair.

Tokenization

Tokenization is the process of breaking down a sentence or a document into meaningful units called tokens. A token can be a paragraph, a sentence, a collocation, or just a word.
For example, a word tokenizer would split the sentence Learning to use Flair into a list of tokens as ["Learning", "to", "use", "Flair"].
Tokenization has to adhere to language-specific rules and is rarely a trivial task to solve. For example, with unspaced languages where word boundaries aren't defined with spaces, it's very difficult to determine where one word ends and the next one starts. Well-defined token boundaries are a prerequisite for most NLP tasks that aim to process words, collocations, or sentences including the following tasks explained in this chapter.

Text vectorization

Text vectorization is a process of transforming words, sentences, or documents in their written form into a numerical representation understandable to machines.
One of the simplest forms of text vectorization is one-hot encoding. It maps words to binary vectors of length equal to the number of words in the dictionary. All elements of the vector are 0 apart from the element that represents the word, which is set to 1 – hence the name one-hot.
For example, take the following dictionary:
  • Cat
  • Dog
  • Goat
The word cat would be the first word in our dictionary and its one-hot encoding would be [1, 0...

Inhaltsverzeichnis