
Natural Language Processing and Computational Linguistics
A practical guide to text analysis with Python, Gensim, spaCy, and Keras
- 306 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
Natural Language Processing and Computational Linguistics
A practical guide to text analysis with Python, Gensim, spaCy, and Keras
About this book
Work with Python and powerful open source tools such as Gensim and spaCy to perform modern text analysis, natural language processing, and computational linguistics algorithms.About This Book• Discover the open source Python text analysis ecosystem, using spaCy, Gensim, scikit-learn, and Keras• Hands-on text analysis with Python, featuring natural language processing and computational linguistics algorithms• Learn deep learning techniques for text analysisWho This Book Is ForThis book is for you if you want to dive in, hands-first, into the interesting world of text analysis and NLP, and you're ready to work with the rich Python ecosystem of tools and datasets waiting for you!What You Will Learn• Why text analysis is important in our modern age• Understand NLP terminology and get to know the Python tools and datasets• Learn how to pre-process and clean textual data• Convert textual data into vector space representations• Using spaCy to process text• Train your own NLP models for computational linguistics• Use statistical learning and Topic Modeling algorithms for text, using Gensim and scikit-learn• Employ deep learning techniques for text analysis using KerasIn DetailModern text analysis is now very accessible using Python and open source tools, so discover how you can now perform modern text analysis in this era of textual data.This book shows you how to use natural language processing, and computational linguistics algorithms, to make inferences and gain insights about data you have. These algorithms are based on statistical machine learning and artificial intelligence techniques. The tools to work with these algorithms are available to you right now - with Python, and tools like Gensim and spaCy.You'll start by learning about data cleaning, and then how to perform computational linguistics from first concepts. You're then ready to explore the more sophisticated areas of statistical NLP and deep learning using Python, with realistic language and text samples. You'll learn to tag, parse, and model text using the best tools. You'll gain hands-on knowledge of the best frameworks to use, and you'll know when to choose a tool like Gensim for topic models, and when to work with Keras for deep learning.This book balances theory and practical hands-on examples, so you can learn about and conduct your own natural language processing projects and computational linguistics. You'll discover the rich ecosystem of Python tools you have available to conduct NLP - and enter the interesting world of modern text analysis.Style and approachThe book teaches NLP from the angle of a practitioner as well as that of a student. This is a tad unusual, but given the enormous speed at which new algorithms and approaches travel from scientific beginnings to industrial implementation, first principles can be clarified with the help of entirely practical examples.
Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead
Information
Word2Vec, Doc2Vec, and Gensim
- Word2Vec
- Doc2Vec
- Other word embeddings
Word2Vec
Using Word2Vec with Gensim
from gensim.models import word2vec
- sg: This defines the training algorithm. By default (sg=0), CBOW is used. Otherwise (sg=1), skip-gram is employed.
- size: This is the dimensionality of the feature vectors.
- window: This is the maximum distance between the current and predicted word within a sentence.
- alpha: This is the initial learning rate (will linearly drop to min_alpha as training progresses).
- seed: This is used for the random number generator. Initial vectors for each word are seeded with a hash of the concatenation of word + str(seed). Note that for a fully deterministically reproducible run, you must also limit the model to a single worker thread, to eliminate ordering jitter from OS thread scheduling. (In Python 3, reproducibility between interpreter launches also requires the use of the PYTHONHASHSEED environment variable to control hash randomization.)
- min_count: Ignore all words with a total frequency lower than this.
- max_vocab_size: Limit RAM during vocabulary building; if there are more unique words than this, then prune the infrequent ones. Every 10 million word types need about 1 GB of RAM. Set to None for no limit (default).
- sample: This is the ...
Table of contents
- Title Page
- Copyright and Credits
- Packt Upsell
- Contributors
- Preface
- What is Text Analysis?
- Python Tips for Text Analysis
- spaCy's Language Models
- Gensim – Vectorizing Text and Transformations and n-grams
- POS-Tagging and Its Applications
- NER-Tagging and Its Applications
- Dependency Parsing
- Topic Models
- Advanced Topic Modeling
- Clustering and Classifying Text
- Similarity Queries and Summarization
- Word2Vec, Doc2Vec, and Gensim
- Deep Learning for Text
- Keras and spaCy for Deep Learning
- Sentiment Analysis and ChatBots
- Other Books You May Enjoy
Frequently asked questions
- Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
- Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app