Machine Learning Algorithms
eBook - ePub

Machine Learning Algorithms

  1. 360 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Machine Learning Algorithms

About this book

Build strong foundation for entering the world of Machine Learning and data science with the help of this comprehensive guideAbout This Book• Get started in the field of Machine Learning with the help of this solid, concept-rich, yet highly practical guide.• Your one-stop solution for everything that matters in mastering the whats and whys of Machine Learning algorithms and their implementation.• Get a solid foundation for your entry into Machine Learning by strengthening your roots (algorithms) with this comprehensive guide.Who This Book Is ForThis book is for IT professionals who want to enter the field of data science and are very new to Machine Learning. Familiarity with languages such as R and Python will be invaluable here.What You Will Learn• Acquaint yourself with important elements of Machine Learning• Understand the feature selection and feature engineering process• Assess performance and error trade-offs for Linear Regression• Build a data model and understand how it works by using different types of algorithm• Learn to tune the parameters of Support Vector machines• Implement clusters to a dataset• Explore the concept of Natural Processing Language and Recommendation Systems• Create a ML architecture from scratch.In DetailAs the amount of data continues to grow at an almost incomprehensible rate, being able to understand and process data is becoming a key differentiator for competitive organizations. Machine learning applications are everywhere, from self-driving cars, spam detection, document search, and trading strategies, to speech recognition. This makes machine learning well-suited to the present-day era of Big Data and Data Science. The main challenge is how to transform data into actionable knowledge.In this book you will learn all the important Machine Learning algorithms that are commonly used in the field of data science. These algorithms can be used for supervised as well as unsupervised learning, reinforcement learning, and semi-supervised learning. A few famous algorithms that are covered in this book are Linear regression, Logistic Regression, SVM, Naive Bayes, K-Means, Random Forest, TensorFlow, and Feature engineering. In this book you will also learn how these algorithms work and their practical implementation to resolve your problems. This book will also introduce you to the Natural Processing Language and Recommendation systems, which help you run multiple algorithms simultaneously.On completion of the book you will have mastered selecting Machine Learning algorithms for clustering, classification, or regression based on for your problem.Style and approachAn easy-to-follow, step-by-step guide that will help you get to grips with real -world applications of Algorithms for Machine Learning.

Tools to learn more effectively

Saving Books

Saving Books

Keyword Search

Keyword Search

Annotating Text

Annotating Text

Listen to it instead

Listen to it instead

Information

Topic Modeling and Sentiment Analysis in NLP

In this chapter, we're going to introduce some common topic modeling methods, discussing some applications. Topic modeling is a very important NLP section and its purpose is to extract semantic pieces of information out of a corpus of documents. We're going to discuss latent semantic analysis, one of most famous methods; it's based on the same philosophy already discussed for model-based recommendation systems. We'll also discuss its probabilistic variant, PLSA, which is aimed at building a latent factor probability model without any assumption of prior distributions. On the other hand, the Latent Dirichlet Allocation is a similar approach that assumes a prior Dirichlet distribution for latent variables. In the last section, we're going to discuss sentiment analysis with a concrete example based on a Twitter dataset.

Topic modeling

The main goal of topic modeling in natural language processing is to analyze a corpus in order to identify common topics among documents. In this context, even if we talk about semantics, this concept has a particular meaning, driven by a very important assumption. A topic derives from the usage of particular terms in the same document and it is confirmed by the multiplicity of different documents where the first condition is true.
In other words, we don't consider a human-oriented semantics but a statistical modeling that works with meaningful documents (this guarantees that the usage of terms is aimed to express a particular concept and, therefore, there's a human semantic purpose behind them). For this reason, the starting point of all our methods is an occurrence matrix, normally defined as a document-term matrix (we have already discussed count vectorizing and tf-idf in Chapter 12, Introduction to NLP):
In many papers, this matrix is transposed (it's a term-document one); however, scikit-learn produces document-term matrices, and, to avoid confusion, we are going to consider this structure.

Latent semantic analysis

The idea behind latent semantic analysis is factorizing Mdw so as to extract a set of latent variables (this means that we can assume their existence but they cannot be observed directly) that work as connectors between the document and terms. As discussed in Chapter 11, Introduction to Recommendation Systems, a very common decomposition method is SVD:
However, we're not interested in a full decomposition; we are interested only in the subspace defined by the top k singular values:
This approximation has the reputation of being the best one considering the Frobenius norm, so it guarantees a very high level of accuracy. When applying it to a document-term matrix, we obtain the following decomposition:
Or, in a more compact way:
Here, the first matrix defines a relationship among documents and k latent variables, and the second a relationship among k latent variables and words. Considering the structure of the original matrix and what is explained at the beginning of this chapter, we can consider the latent variables as topics that define a subspace where the documents are projected. A generic document can now be defined as:
Furthermore, each topic becomes a linear combination of words. As the weight of many words is close to zero, we can decide to take only the top r words to define a topic; therefore, we get:
Here, each hji is obtained after sorting the columns of Mtwk. To better understand the process, let's show a complete example based on a subset of Brown corpus (500 documents from the news category):
 from nltk.corpus import brown

>>> sentences = brown.sents(categories=['news'])[0:500]
>>> corpus = []

>>> for s in sentences:
>>> corpus.append(' '.join(s))
After defining the corpus, we need to tokenize and vectorize using a tf-idf approach:
 from sklearn.feature_extraction.text import TfidfVectorizer

>>> vectorizer = TfidfVectorizer(strip_accents='unicode', stop_words='english', norm='l2', sublinear_tf=True)
>>> Xc = vectorizer.fit_transform(corpus).todense()
Now it's possible to apply an SVD to the Xc matrix (remember that in SciPy, the V matrix is already transposed):
 from scipy.linalg import svd

>>> U, s, V = svd(Xc, full_matrices=False)
As the corpus is not very small, it's useful to set the parameter full_matrices=False to save computational time. We assume we have two topics, so we can extract our sub-matrices:
 import numpy as np

>>> rank = 2


>>> Uk = U[:, 0:rank]
>>> sk = np.diag(s)[0:rank, 0:rank]
>>> Vk = V[0:rank, :]
If we want to analyze the top 10 words per topic, we need to consid...

Table of contents

  1. Title Page
  2. Copyright
  3. Credits
  4. About the Author
  5. About the Reviewers
  6. www.PacktPub.com
  7. Customer Feedback
  8. Preface
  9. A Gentle Introduction to Machine Learning
  10. Important Elements in Machine Learning
  11. Feature Selection and Feature Engineering
  12. Linear Regression
  13. Logistic Regression
  14. Naive Bayes
  15. Support Vector Machines
  16. Decision Trees and Ensemble Learning
  17. Clustering Fundamentals
  18. Hierarchical Clustering
  19. Introduction to Recommendation Systems
  20. Introduction to Natural Language Processing
  21. Topic Modeling and Sentiment Analysis in NLP
  22. A Brief Introduction to Deep Learning and TensorFlow
  23. Creating a Machine Learning Architecture

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Machine Learning Algorithms by Giuseppe Bonaccorso in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Processing. We have over one million books available in our catalogue for you to explore.