Python Natural Language Processing Cookbook
eBook - ePub

Python Natural Language Processing Cookbook

Over 50 recipes to understand, analyze, and generate text for implementing language processing tasks

  1. 284 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Python Natural Language Processing Cookbook

Over 50 recipes to understand, analyze, and generate text for implementing language processing tasks

About this book

Get to grips with solving real-world NLP problems, such as dependency parsing, information extraction, topic modeling, and text data visualization

Key Features

  • Analyze varying complexities of text using popular Python packages such as NLTK, spaCy, sklearn, and gensim
  • Implement common and not-so-common linguistic processing tasks using Python libraries
  • Overcome the common challenges faced while implementing NLP pipelines

Book Description

Python is the most widely used language for natural language processing (NLP) thanks to its extensive tools and libraries for analyzing text and extracting computer-usable data. This book will take you through a range of techniques for text processing, from basics such as parsing the parts of speech to complex topics such as topic modeling, text classification, and visualization.

Starting with an overview of NLP, the book presents recipes for dividing text into sentences, stemming and lemmatization, removing stopwords, and parts of speech tagging to help you to prepare your data. You'll then learn ways of extracting and representing grammatical information, such as dependency parsing and anaphora resolution, discover different ways of representing the semantics using bag-of-words, TF-IDF, word embeddings, and BERT, and develop skills for text classification using keywords, SVMs, LSTMs, and other techniques. As you advance, you'll also see how to extract information from text, implement unsupervised and supervised techniques for topic modeling, and perform topic modeling of short texts, such as tweets. Additionally, the book shows you how to develop chatbots using NLTK and Rasa and visualize text data.

By the end of this NLP book, you'll have developed the skills to use a powerful set of tools for text processing.

What you will learn

  • Become well-versed with basic and advanced NLP techniques in Python
  • Represent grammatical information in text using spaCy, and semantic information using bag-of-words, TF-IDF, and word embeddings
  • Perform text classification using different methods, including SVMs and LSTMs
  • Explore different techniques for topic modeling such as K-means, LDA, NMF, and BERT
  • Work with visualization techniques such as NER and word clouds for different NLP tools
  • Build a basic chatbot using NLTK and Rasa
  • Extract information from text using regular expression techniques and statistical and deep learning tools

Who this book is for

This book is for data scientists and professionals who want to learn how to work with text. Intermediate knowledge of Python will help you to make the most out of this book. If you are an NLP practitioner, this book will serve as a code reference when working on your projects.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Python Natural Language Processing Cookbook by Zhenya Antić in PDF and/or ePUB format, as well as other popular books in Informatik & Datenverarbeitung. We have over one million books available in our catalogue for you to explore.

Information

Year
2021
Print ISBN
9781838987312
eBook ISBN
9781838987787
Edition
1

Chapter 1: Learning NLP Basics

While working on this book, I focused on including recipes that would be useful in a wide variety of NLP (Natural Language Processing) projects. They range from the simple to the advanced, and deal with everything from grammar to visualizations; and in many of them, options for languages other than English are included. I hope you find the book useful.
Before we can get on with the real work of NLP, we need to prepare our text for processing. This chapter will show you how to do that. By the end of the chapter, you will be able to have a list of words in a piece of text arranged with their parts of speech and lemmas or stems, and with very frequent words removed.
NLTK and spaCy are two important packages that we will be working with in this chapter and throughout the book.
The recipes included in this chapter are as follows:
  • Dividing text into sentences
  • Dividing sentences into words: tokenization
  • Parts of speech tagging
  • Stemming
  • Combining similar words: lemmatization
  • Removing stopwords

Technical requirements

Throughout this book, I will be showing examples that were run using an Anaconda installation of Python 3.6.10. To install Anaconda, follow the instructions here: https://docs.anaconda.com/anaconda/install/.
After you have installed Anaconda, use it to create a virtual environment:
conda create -n nlp_book python=3.6.10 anaconda
activate nlp_book
Then, install spaCy 2.3.0 and NLTK 3.4.5:
pip install nltk
pip install spacy
After you have installed spaCy and NLTK, install the models needed to use them. For spaCy, use this:
python -m spacy download en_core_web_sm
Use Python commands to download the necessary model for NLTK:
python
>>> import nltk
>>> nltk.download('punkt')
All the code that is in this book can be found in the book's GitHub repository: https://github.com/PacktPublishing/Python-Natural-Language-Processing-Cookbook.
Important note
The files in the book's GitHub repository should be run using the -m option from the main directory that contains the code subfolders for each chapter. For example, you would use it as follows:
python -m Chapter01.dividing_into_sentences

Dividing text into sentences

When we work with text, we can work with text units on different scales: we can work at the level of the document itself, such as a newspaper article; the paragraph, the sentence, or the word. Sentences are the main unit of processing in many NLP tasks. In this section, I will show you how to divide text into sentences.

Getting ready

For this part, we will be using the text of the book The Adventures of Sherlock Holmes. You can find the whole text in the book's GitHub (see the sherlock_holmes.txt file). For this recipe, we will need just the beginning of the book, which can be found in the sherlock_holmes_1.txt file.
In order to do this task, you will need the nltk package and its sentence tokenizers, described in the Technical requirements section.

How to do it…

We will now divide the text of The Adventures of Sherlock Holmes, outputting a list of sentences:
  1. Import the nltk package:
    import nltk
  2. Read in the book text:
    filename = "sherlock_holmes_1.txt"
    file = open(filename, "r", encoding="utf-8")
    text = file.read()
  3. Replace newlines with spaces:
    text = text.replace("\n", " ")
  4. Initialize an NLTK tokenizer. This uses the punkt model we downloaded previously:
    tokenizer = nltk.data.load("tokenizers/punkt/english.pickle")
  5. Divide the text into sentences:
    sentences = tokenizer.tokenize(text)
    The resulting list, sentences, has all the sentences in the first part of the book:
    ['To Sherlock Holmes she is always _the_ woman.', 'I have seldom heard him mention her under any other name.', 'In his eyes she eclipses and predominates the whole of her sex.', 'It was not that he felt any emotion akin to love for Irene Adler.', 'All emotions, and that one particularly, were abhorrent to his cold, precise but admirably balanced mind.', 'He was, I take it, the most perfect reasoning and observing machine that the world has seen, but as a lover he would have placed himself in a false position.', 'He never spoke of the softer passions, save with a gibe and a sneer.', 'They were admirable things for the observer—excellent for drawing the veil from men's motives and actions.', 'But for the trained reasoner to admit such intrusions into his own delicate and finely adjusted temperament was to introduce a distracting factor which might throw a doubt upon all his mental results.', 'Grit in a sensitive instrument, or a crack in one of his own high-power lenses, would not be more disturbing than a strong emotion in a nature such as his.', 'And yet there was but one woman to him, and that woman was the late Irene Adler, of dubious and questionable memory.']

How it works…

In step 1, we import the nltk package. In step 2...

Table of contents

  1. Python Natural Language Processing Cookbook
  2. Contributors
  3. Preface
  4. Chapter 1: Learning NLP Basics
  5. Chapter 2: Playing with Grammar
  6. Chapter 3: Representing Text – Capturing Semantics
  7. Chapter 4: Classifying Texts
  8. Chapter 5: Getting Started with Information Extraction
  9. Chapter 6: Topic Modeling
  10. Chapter 7: Building Chatbots
  11. Chapter 8: Visualizing Text Data
  12. Other Books You May Enjoy