Machine Learning for Beginners
eBook - ePub

Machine Learning for Beginners

Learn to Build Machine Learning Systems Using Python

Harsh Bhasin

Share book
  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

Machine Learning for Beginners

Learn to Build Machine Learning Systems Using Python

Harsh Bhasin

Book details
Book preview
Table of contents
Citations

About This Book

Get familiar with various Supervised, Unsupervised and Reinforcement learning algorithms Key Features

  • Understand the types of Machine learning.
  • Get familiar with different Feature extraction methods.
  • Get an overview of how Neural Network Algorithms work.
  • Learn how to implement Decision Trees and Random Forests.
  • The book not only explains the Classification algorithms but also discusses the deviations/ mathematical modeling.

  • Description
    This book covers important concepts and topics in Machine Learning. It begins with Data Cleansing and presents an overview of Feature Selection. It then talks about training and testing, cross-validation, and Feature Selection. The book covers algorithms and implementations of the most common Feature Selection Techniques. The book then focuses on Linear Regression and Gradient Descent. Some of the important Classification techniques such as K-nearest neighbors, logistic regression, NaĂŻve Bayesian, and Linear Discriminant Analysis are covered in the book. It then gives an overview of Neural Networks and explains the biological background, the limitations of the perceptron, and the backpropagation model. The Support Vector Machines and Kernel methods are also included in the book. It then shows how to implement Decision Trees and Random Forests. Towards the end, the book gives a brief overview of Unsupervised Learning. Various Feature Extraction techniques, such as Fourier Transform, STFT, and Local Binary patterns, are covered. The book also discusses Principle Component Analysis and its implementation. What will you learn
  • Learn how to prepare Data for Machine Learning.
  • Learn how to implement learning algorithms from scratch.
  • Use scikit-learn to implement algorithms.
  • Use various Feature Selection and Feature Extraction methods.
  • Learn how to develop a Face recognition system.

  • Who this book is for
    The book is designed for Undergraduate and Postgraduate Computer Science students and for the professionals who intend to switch to the fascinating world of Machine Learning. This book requires basic know-how of programming fundamentals, Python, in particular. Table of Contents
    1. An introduction to Machine Learning
    2. The beginning: Pre-Processing and Feature Selection
    3. Regression
    4. Classification
    5. Neural Networks- I
    6. Neural Networks-II
    7. Support Vector machines
    8. Decision Trees
    9. Clustering
    10. Feature Extraction
    Appendix
    A1. Cheat Sheets
    A2. Face Detection
    A3.Biblography About the Author
    Harsh Bhasin is an Applied Machine Learning researcher. Mr. Bhasin worked as Assistant Professor in Jamia Hamdard, New Delhi, and taught as a guest faculty in various institutes including Delhi Technological University. Before that, he worked in C# Client-Side Development and Algorithm Development.
    Mr. Bhasin has authored a few papers published in renowned journals including Soft Computing, Springer, BMC Medical Informatics and Decision Making, AI and Society, etc. He is the reviewer of prominent journals and has been the editor of a few special issues. He has been a recipient of a distinguished fellowship.
    Outside work, he is deeply interested in Hindi Poetry, progressive era; Hindustani Classical Music, percussion instruments.
    His areas of interest include Data Structures, Algorithms Analysis and Design, Theory of Computation, Python, Machine Learning and Deep learning. Your LinkedIn Profile:
    https://in.linkedin.com/in/harsh-bhasin-69134426

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Machine Learning for Beginners an online PDF/ePUB?
Yes, you can access Machine Learning for Beginners by Harsh Bhasin in PDF and/or ePUB format, as well as other popular books in Informatik & Datenvisualisierung. We have over one million books available in our catalogue for you to explore.

Information

Year
2020
ISBN
9789389845426

CHAPTER 1

An Introduction to Machine Learning

With the advancements in technology, data collection has become easy. When you turn on location in your mobile, upload your pictures on Facebook or Instagram, fill online forms, browse websites, or even order items from an e-commerce website, your data is collected. What do companies do with this huge data? They analyze it, find your preferences, and this helps them in marketing. The advertisements being shown to you, generally, depending on the above things. Marketing professionals must lure you into buying something that you need or are even remotely interested in. Your data helps them. Likewise, the dispensation may keep track of suspicious activities using this data, may tract the source of transactions, or gather other important information using this data. However, this is easier said than done. It is a huge data, and its analysis cannot be done using conventional methods.
Let us consider another example to understand this. Suppose Hari visits YouTube every day and watches videos related to Indian Classical Music, Hindi Poetry, and watch Lizzie McGuire. His friend Tarush goes to YouTube and watches Beer Biceps and other videos related to workouts. After some time, YouTube starts suggesting different relevant videos to both of them. While Hari is shown a video related to Lizzie McGuire’s reboot or Dinkar, in the recommended videos’ list, Tarush is not recommended any such video. On the other hand, Tarush is shown a recommendation for a workout video.
It may be stated that recommendation requires an in-depth analysis and cannot be done solely based on any conventional algorithms. Those using e-commerce websites or famous music streaming apps like YouTube must be knowing that the recommendations are mostly good, if not excellent. Here the task is prediction. Your browsing history helps in this task, and for sure, it cannot be accomplished by conventional algorithms. Moreover, the betterment in the output, with time, means there is a well-defined performance measure for the task.
Machine learning comes to the rescue of those wanting to analyze this huge data, predict trends, find patterns, and so on. This chapter introduces machine learning, discusses it’s types, explains how the given data is divided, and discusses its pipeline. This chapter also presents an overview of the history of machine learning and its applications.

Structure

The main topics covered in this chapter are as follows:
  • Conventional algorithm and machine learning
  • Types of learning
  • Working
  • Applications of machine learning
  • History of machine learning

Objective

After reading this chapter, the reader will be able to learn the following topics:
  • Understand the definition and types of machine learning
  • Understand the working of a machine learning algorithm
  • Appreciate the applications of machine learning
  • Learn about the history of machine learning

Conventional algorithm and machine learning

The algorithmic solution of a problem requires the input data and a program to produce an output. Here, a program is a set of instructions, and output is generated by applying those instructions to the input data. In a machine learning algorithm, the system takes the Input Data along with the examples of Output (in the case of supervised learning). It creates a model, which establishes (or tries to establish) some relation between the input and the output. Learning, in general, is improving the outcome using experience (E). How do we know that we have improved? The performance measure tells the performance of our model. As per Tom Michel, machine learning can be defined as follows.
If the performance measure (P) improves with experience (E) on task (T), then the system is said to have learned.
Here, the Task (T) can be Classification, Regression, clustering, and so on. The data constitutes Experience (E). The Performance Measure (P) can be any accuracy, specificity, sensitivity, F measure, Sum of Squared errors, and so on. These terms will be defined as we proceed. To understand this, let us consider an example of disease classification using Magnetic Resonance Imagining. If the number of patients correctly classified (accuracy) as diseased is considered as a performance measure, then this problem can be defined as follows:
  • T: Classify given patients as diseased or not-diseased
  • P: Accuracy
  • E: The MRI images of a patient
The task will be accomplished by pre-processing the given data, extracting relevant features from the pre-processed data, selecting the most important features, applying a classification algorithm followed by post-processing. In general, a machine learning pipeline constitutes the following steps (Figure 1.1):
Figure 1.1: Machine learning pipeline
These terms will become clear in the following chapters. Pre-processing has been discussed in the second chapter. The chapter also introduces the idea of Feature selection. The next six chapters discuss supervised learning techniques, and the last chapter introduces Feature extraction. I decided to discuss Feature extraction at the end because some of the techniques require the knowhow of concepts introduced in the previous seven chapters. Having seen the definition of machine learning, let us now have a look at its types.

Types of learning

Machine learning can be classified as supervised, unsupervised, or semi-supervised. This section gives a brief overview of the types.

Supervised machine learning

This type of learning uses the labels of the data in training set to predict the label of a sample in the test set. The training set acts as a teacher in this type of algorithm, which supervises the training process. The data in these algorithms contain samples and their correct labels. The training process tries to uncover the pattern hidden in the data. That is, the learning aims to relate the labels Y with the data X as y = f(x), where x is a sample, and y is the label.
If this label is a discrete value, then the process is termed as classification. If y is a real value, then it is called regression. Chapter 3 of this book introduces a regression, and Chapter 4 to Chapter 8 discusses classification algorithms.
Examples of classification are face detection, voice detection, object detection, and so on. Classification essentially means placing the given sample into one of the predefined categories. Examples of regression include predicting the price of a commodity, predicting temperature, housing price, and so on.

Unsupervised learning

This type of learning uses input Data(X) but no labels. The learning aims to learn about the data by grouping the like samples or by deducing the associations. Since there is no teacher involved in the algorithm, it is called unsupervised learning. Clustering and association come under unsupervised learning. Clustering uncovers the groupings in the data. Association, on the other hand, uncovers the rules which associate the events. Chapter 9 of this book discusses clustering.
There is something in between supervised and unsupervised learning. It is called semi-supervised learning. In this type of learning, a part of the input data may be labeled. Many practical problems fall into this category.

Working

This section discusses the working of a machine learning algorithm. We begin with understanding the data. It is followed by the division of data into train and test sets. The learning algorithm is then applied to the training data, and the performance is then measured.

Data

In the discussion that follows, the data is represented by X, which is a matrix with n rows and m columns (n × m matrix). Here, n is the number of samples, and m is the number of features in each sample. The labels are represented by y, which is a (n × 1 matrix). It may be noted that the ith row of y contains the label corresponding to the ith row of X.
For example, consider the Wine dataset available at the UCI Machine Learning Repository. The data considers attributes of wines from three different cultivars but from the same region in Italy. The dataset has 13 features, which are as follows (as per the official documentation at https://archive.ics.uci.edu/ml/datasets/Wine):
  1. Alcohol
  2. Malic acid
  3. Ash
  4. Alkalinity of ash
  5. Magnesium
  6. Total phenols
  7. Flavanoids
  8. Nonflavanoid phenols
  9. Proanthocyanins
  10. Color intensity
  11. Hue
  12. OD280/OD315 of diluted wines
  13. Proline
The label is the class of the Wine (1, 2, or 3). The number of samples in the dataset is 178. That is, the values of the 13 features determine the class of Wine. The value of n is 178, and that of m is 13. The data, X, is 178 × 13 array, and the response variable, y, is a 178 × 1 array. It is followed by pre-processing, which involves many things, including removing null values. Some of these techniques have been discussed in the second chapter. Once you have got the data, create a train, and a test set out of the data.

Train test validation data

Suppose yo...

Table of contents