Natural Language Processing with Flair
eBook - ePub

Natural Language Processing with Flair

Tadej Magajna

Share book
  1. 200 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Natural Language Processing with Flair

Tadej Magajna

Book details
Book preview
Table of contents
Citations

About This Book

Learn how to solve practical NLP problems with the Flair Python framework, train sequence labeling models, work with text classifiers and word embeddings, and much more through hands-on practical exercisesKey Features• Backed by the community and written by an NLP expert• Get an understanding of basic NLP problems and terminology• Solve real-world NLP problems with Flair with the help of practical hands-on exercisesBook DescriptionFlair is an easy-to-understand natural language processing (NLP) framework designed to facilitate training and distribution of state-of-the-art NLP models for named entity recognition, part-of-speech tagging, and text classification. Flair is also a text embedding library for combining different types of embeddings, such as document embeddings, Transformer embeddings, and the proposed Flair embeddings.Natural Language Processing with Flair takes a hands-on approach to explaining and solving real-world NLP problems. You'll begin by installing Flair and learning about the basic NLP concepts and terminology. You will explore Flair's extensive features, such as sequence tagging, text classification, and word embeddings, through practical exercises. As you advance, you will train your own sequence labeling and text classification models and learn how to use hyperparameter tuning in order to choose the right training parameters. You will learn about the idea behind one-shot and few-shot learning through a novel text classification technique TARS. Finally, you will solve several real-world NLP problems through hands-on exercises, as well as learn how to deploy Flair models to production.By the end of this Flair book, you'll have developed a thorough understanding of typical NLP problems and you'll be able to solve them with Flair.What you will learn• Gain an understanding of core NLP terminology and concepts• Get to grips with the capabilities of the Flair NLP framework• Find out how to use Flair's state-of-the-art pre-built models• Build custom sequence labeling models, embeddings, and classifiers• Learn about a novel text classification technique called TARS• Discover how to build applications with Flair and how to deploy them to productionWho this book is forThis Flair NLP book is for anyone who wants to learn about NLP through one of the most beginner-friendly, yet powerful Python NLP libraries out there. Software engineering students, developers, data scientists, and anyone who is transitioning into NLP and is interested in learning about practical approaches to solving problems with Flair will find this book useful. The book, however, is not recommended for readers aiming to get an in-depth theoretical understanding of the mathematics behind NLP. Beginner-level knowledge of Python programming is required to get the most out of this book.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Natural Language Processing with Flair an online PDF/ePUB?
Yes, you can access Natural Language Processing with Flair by Tadej Magajna in PDF and/or ePUB format, as well as other popular books in Informatica & Elaborazione di dati. We have over one million books available in our catalogue for you to explore.

Information

Year
2022
ISBN
9781801072236

Part 1: Understanding and Solving NLP with Flair

In this part, you will learn the basics of NLP and get an overview of the Flair framework. You will set up your environment, install Flair, and explore its basic features. You will learn how to extract knowledge from embeddings and use pre-trained sequence labeling models in Flair. 
This part comprises the following chapters:
  • Chapter 1, Introduction to Flair
  • Chapter 2, Flair Base Types
  • Chapter 3, Embeddings in Flair
  • Chapter 4, Sequence Tagging

Chapter 1: Introduction to Flair

There are few Natural Language Processing (NLP) frameworks out there as easy to learn and as easy to work with as Flair. Packed with pre-trained models, excellent documentation, and readable syntax, it provides a gentle learning curve for NLP researchers who are not necessarily skilled in coding; software engineers with poor theoretical foundations; students and graduates; as well as individuals with no prior knowledge simply interested in the topic. But before diving straight into coding, some background about the motivation behind Flair, the basic NLP concepts, and the different approaches to how you can set up your local environment may help you on your journey toward becoming a Flair NLP expert.
In Flair's official GitHub README, the framework is described as:
"A very simple framework for state-of-the-art Natural Language Processing"
This description will raise a few eyebrows. NLP researchers will immediately be interested in knowing what specific tasks the framework achieves its state-of-the-art results in. Engineers will be intrigued by the very simple label, but will wonder what steps are required to get up and running and what environments it can be used in. And those who are not knowledgeable in NLP will wonder whether they will be able to grasp the knowledge required to understand the problems Flair is trying to solve.
In this chapter, we will be answering all of these questions by covering the basic NLP concepts and terminology, providing an overview of Flair, and setting up our development environment with the help of the following sections:
  • A brief introduction to NLP
  • What is Flair?
  • Getting ready

Technical requirements

To get started, you will need a development environment with Python 3.6+. Platform-specific instructions for installing Python can be found at https://docs.python-guide.org/starting/installation/.
You will not require a GPU-equipped development machine, though having one will significantly speed up some of the training-related exercises described later in the book.
You will require access to a command line. On Linux and macOS, simply start the Terminal application. On Windows, press Windows + R to open the Run box, type cmd and then click OK.
Flair's official GitHub repository is available via the following link: https://github.com/flairNLP/flair. In this chapter we will install Flair version 0.11.
The code examples covered in this chapter are found in this book's official GitHub repository in the following Jupyter notebook: https://github.com/PacktPublishing/Natural-Language-Processing-with-Flair/tree/main/Chapter01.

A brief introduction to NLP

Before diving straight into what Flair is capable of and how to leverage its features, we will be going through a brief introduction to NLP to provide some context for readers who are not familiar with all the NLP techniques and tasks solved by Flair. NLP is a branch of artificial intelligence, linguistics, and software engineering that helps machines understand human language. When we humans read a sentence, our brains immediately make sense of many seemingly trivial problems such as the following:
  • Is the sentence written in a language I understand?
  • How can the sentence be split into words?
  • What is the relationship between the words?
  • What are the meanings of the individual words?
  • Is this a question or an answer?
  • Which part-of-speech categories are the words assigned to?
  • What is the abstract meaning of the sentence?
The human brain is excellent at solving these problems conjointly and often seamlessly, leaving us unaware that we made sense of all of these things simply by reading a sentence.
Even now, machines are still not as good as humans at solving all these problems at once. Therefore, to teach machines to understand human language, we have to split understanding of natural language into a set of smaller, machine-intelligible tasks that allow us to get answers to these questions one by one.
In this section, you will find a list of some important NLP tasks with emphasis on the tasks supported by Flair.

Tokenization

Tokenization is the process of breaking down a sentence or a document into meaningful units called tokens. A token can be a paragraph, a sentence, a collocation, or just a word.
For example, a word tokenizer would split the sentence Learning to use Flair into a list of tokens as ["Learning", "to", "use", "Flair"].
Tokenization has to adhere to language-specific rules and is rarely a trivial task to solve. For example, with unspaced languages where word boundaries aren't defined with spaces, it's very difficult to determine where one word ends and the next one starts. Well-defined token boundaries are a prerequisite for most NLP tasks that aim to process words, collocations, or sentences including the following tasks explained in this chapter.

Text vectorization

Text vectorization is a process of transforming words, sentences, or documents in their written form into a numerical representation understandable to machines.
One of the simplest forms of text vectorization is one-hot encoding. It maps words to binary vectors of length equal to the number of words in the dictionary. All elements of the vector are 0 apart from the element that represents the word, which is set to 1 – hence the name one-hot.
For example, take the following dictionary:
  • Cat
  • Dog
  • Goat
The word cat would be the first word in our dictionary and its one-hot encoding would be [1, 0...

Table of contents