Building Machine Learning Systems Using Python
eBook - ePub

Building Machine Learning Systems Using Python

Practice to Train Predictive Models and Analyze Machine Learning Results with Real Use-Cases (English Edition)

Deepti Chopra

Share book
  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

Building Machine Learning Systems Using Python

Practice to Train Predictive Models and Analyze Machine Learning Results with Real Use-Cases (English Edition)

Deepti Chopra

Book details
Book preview
Table of contents
Citations

About This Book

Explore Machine Learning Techniques, Different Predictive Models, and its Applications

Key Features
? Extensive coverage of real examples on implementation and working of ML models.
? Includes different strategies used in Machine Learning by leading data scientists.
? Focuses on Machine Learning concepts and their evolution to algorithms.

Description
This book covers basic concepts of Machine Learning, various learning paradigms, different architectures and algorithms used in these paradigms.You will learn the power of ML models by exploring different predictive modeling techniques such as Regression, Clustering, and Classification. You will also get hands-on experience on methods and techniques such as Overfitting, Underfitting, Random Forest, Decision Trees, PCA, and Support Vector Machines. In this book real life examples with fully working of Python implementations are discussed in detail.At the end of the book you will learn about the unsupervised learning covering Hierarchical Clustering, K-means Clustering, Dimensionality Reduction, Anomaly detection, Principal Component Analysis.

What you will learn
? Learn to perform data engineering and analysis.
? Build prototype ML models and production ML models from scratch.
? Develop strong proficiency in using scikit-learn and Python.
? Get hands-on experience with Random Forest, Logistic Regression, SVM, PCA, and Neural Networks.

Who this book is for
This book is meant for beginners who want to gain knowledge about Machine Learning in detail. This book can also be used by Machine Learning users for a quick reference for fundamentals in Machine Learning. Readers should have basic knowledge of Python and Scikit-Learn before reading the book.

Table of Contents
1. Introduction to Machine Learning
2. Linear Regression
3. Classification Using Logistic Regression
4. Overfitting and Regularization
5. Feasibility of Learning
6. Support Vector Machine
7. Neural Network
8. Decision Trees
9. Unsupervised Learning
10. Theory of Generalization
11. Bias and Fairness in ML

About the Authors
Dr Deepti Chopra is working as an Assistant Professor (IT) at Lal Bahadur Shastri Institute of Management, Delhi. She has around 7 years of teaching experience. Her areas of interest include Natural Language Processing, Computational Linguistics, and Artificial Intelligence. She is the author of three books and has written several research papers in various international conferences and journals.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Building Machine Learning Systems Using Python an online PDF/ePUB?
Yes, you can access Building Machine Learning Systems Using Python by Deepti Chopra in PDF and/or ePUB format, as well as other popular books in Ciencia de la computación & Procesamiento del lenguaje natural. We have over one million books available in our catalogue for you to explore.

Information

CHAPTER 1

Introduction

Machine learning is one of the applications of artificial intelligence. Machine learning may be defined as the ability of the system to learn automatically through experience without being explicitly programmed. It is based on the development of programs that can access data and use this data to perform learning on their own. In this chapter, we will discuss the classification of machine learning, the various challenges faced in machine learning, and the applications of machine learning.

Structure

  • History of machine learning
  • Classification of machine learning
  • Challenges faced in adopting machine learning
  • Applications

Objectives

  • Understanding the origin of machine learning
  • Understanding the classification of machine learning algorithm
  • Challenges faced in machine learning
  • Applications of machine learning

History of machine learning

In 1940s, the first manually-operated computer, ENIAC (Electronic Numerical Integrator and Computer), was invented. At this time, the word computer was used which meant, 'a machine having intensive numerical computation capabilities'. Since 1940s, the idea was to build a machine that could mimic human behavior of learning and thinking. In 1950s, the first computer game program was developed that could beat the checkers world champion. This helped checker players in improving their skills. At this time, Frank Rosenblatt invented Perceptron, which is a very simple classifier. Machine learning became popular in 1990s when probabilistic approaches of AI were born as a result of the combination of statistics and computer science. Because of the large data available, scientists started building intelligent systems that could analyze and learn from a large amount of data. For example, the IBMs Deep Blue could beat the World Chess Champion, Garry Kasparov. Machine learning is a kind of algorithm in which the software applications can accurately predict the outcomes without being explicitly programmed. The basic essence of machine learning is to build algorithms that, on receiving input data, predicts the output using statistical analysis and updates the output as the new data is made available. The term Machine learning was coined by an American scientist, Arthur Samuel, in 1959 who had expertise in computer gaming and artificial intelligence. According to Arthur Samuel, "It gives computers the ability to learn without being explicitly programmed". According to Tom Mitchell in 1997, "A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E."
Consider a machine learning based system which can help us find the traffic patterns in the busiest location. We can run a machine learning algorithm and use the traffic patterns of the past experience for training the system. If the system has learned successfully, then it will predict the traffic patterns in a better way with performance measure, P.
As machine learning is in boom in today's era, it is conducive to know the various applications as well as the challenges faced in machine learning. In this chapter, we will discuss the different machine learning techniques, the challenges faced in adopting machine learning, and the various application areas of machine learning.

Classification of machine learning

On the basis of the nature of learning and the response or output available to the learning system, machine learning implementations are of three types:
  • Supervised learning: In supervised learning, the learning is performed using example data and its corresponding target response. During testing, when new examples are provided, it predicts the corresponding response. This learning is similar to how a student learns from a teacher. A teacher provides some good examples for the student to memorize. The student is then able to frame general rules to solve problems and draw useful conclusions.
  • Unsupervised learning: In unsupervised learning, the learning is performed using example data without its associated target response. In this type of algorithm, a restructuring of data is performed where the data is segmented into different classes. The objects that belong to the same class have a high degree of similarity.
  • Reinforcement learning: Reinforcement learning is similar to unsupervised learning in which, corresponding to the example data, there is no target response and each example is accompanied by a positive or a negative feedback. A positive feedback or credit is given when, during testing, a correct response is obtained corresponding to the example data. In a negative feedback, the error or penalty is awarded because, during testing, an incorrect response is obtained corresponding to the example data.
  • Semi-supervised learning: In semi-supervised learning, during training, we have example data and some of the corresponding target responses are missing. It is a combination of supervised and unsupervised learning.
On the basis of the desired output, machine learning implementation is divided into the following types:
  • Classification: In this type of learning, two or more classes are assigned to the input present in the training data. During testing, when we provide the input, it is classified into two or more classes. For example, in spam filtering, it classifies whether an email is spam or not spam.
  • Regression: Regression is performed during supervised learning. In this type of learning, the output is continuous rather than discrete.
  • Clustering: Clustering is performed during unsupervised learning in which the testing data is classified into groups and, unlike the task of classification, these groups or classes are not known beforehand.
The different types of machine learning algorithms are depicted in Figure 1.1 as follows:
Figure 1.1: Classification of machine learning algorithm

Challenges faced in adopting machine learning

There are various challenges in adopting machine learning for developing projects. Some of these are as follows:
  • Requirement of proper experimentation and testing: We need to conduct frequent tests in a machine learning system in order to obtain the desired outcome with proper experimentation. The method used to test the machine learning algorithm is referred to as stratification. In this method, we randomly split the data set mainly into two subsets, training set and testing set.
  • Inflexible business models: We should follow an agile and flexible business policy in implementing machine learning. If one of the machine learning strategies is not working, then we need to perform more experimentation and consequently build a new robust machine learning model.
Is the machine learning results ethical? Google is developing software that is used in military project called Project Maven. This project makes use of drone and will create autonomous weapons. Consequently, 12 employees of Google resigned in protest and more than 4000, along with over 1000 well-known scientists, signed a petition requesting the company to abandon the project.
  • Impact of machine learning on humans: A machine learning based system such as a movie recommendation system changes the choice of human over time and narrows them with time. It is interesting to know that people don't notice how they get manipulated by algorithms. Examples include movie recommendation systems, news, propaganda, etc.
  • False correlation: A false correlation comes into play when two parameters that are completely independent of each other show similar behavior. This creates an illusion that these parameters are somehow connected to each other. They are also known as spurious correlation. For example, if there is an increase in the number of car seat belts, there is a decrease in the number of astronaut deaths. This is a false correlation since a car seat belt has nothing to do with accidents occurring in space.
  • Feedback loops: Feedback loops are worse than false correlations. It is a condition where the decision of an algorithm affects reality while convincing that the conclusion is correct. For example, a crime prevention program suggested that more police officials to be sent to a particular area on the basis of an increase in the crime rate. This led to the local residents reporting crimes more frequently as somebody was right there they can report them. This also led to the police officials writing more reports and implementing protocols resulting in a higher crime rate, which meant that more police had to be sent to the area. Earlier, when police officials were not present in the area, people didn't report crimes frequently.
  • Poisoned or contaminated reference data: The outcome of a machine learning algorithm purely depends on the reference data or training data that a machine learns. If the training data or reference data is poisoned or contaminated, then the outcome of machine learning will also be incorrect. For example, if we want to develop a machine translation system, and if the training file consists of incorrect translations, then the output will also be incorrect.
  • Trickery: Even if a machine learning algorithm is working perfectly, it can be tricked. A noise or distortion can completely alter the outcome of the algorithm. In the near future, if a machine learning algorithm is used for the analyses of X-rays emitted from the luggage at the airport and an object is placed next to a gun, then the algorithm will not be able to detect the gun.
  • Mastering machine learning: A data scientist is a person who has expertise in machine learning. Those who are not data scientists may not acquire all of the knowledge related to machine learning. They need to find the key issues in a particular domain of machine learning and then try to overcome these issues. For example, a person who is working on predictive modeling may not have a complete knowledge of a Natural Language Processing (NLP) task.
  • Wrong assumptions are drawn: A machine learning based system needs to deal with missing values in the data sets. For example, the missing value issue can be resolved by using the mean value as the replacement to the missing value. Here, reliable assumptions need to be drawn related to the replacement of the missing values. So, we must make sure that the data doesn't come with the missing values and assumptions drawn are of substantial amount.
  • Machine learning based systems are still not intelligent: While machine learning based systems are constantly evolving, there exists failure as well in the current machine learning based systems. For example, as an experiment, Microsoft's chatbot Tay was released on Twitter that mimicked a teenage girl. It was a failure and consequently the company had to close the experiment and apologize to the whole internet crowd for the hurtful and offensive tweets by chatbot Tay.
  • Computational needs are expensive: In order to perform large data processing, GPUs are used instead of CPUs. Some companies don't have GPUs, so it takes a longer time for the conventional CPUs to process large amounts of data. In some situations, even with GPUs, it may take days or weeks to complete the processing as compared to the traditional software development that may take a few minutes or hours to complete the task.

Applications

Machine learning is a buzzword today. There are numerous applications of machine learning. Some of the applications are shown in Figure 1.2:
Figure 1.2: Applications of machine learning
  • Virtual personal assistants: Some of the most popular examples of virtual personal assistants used today include Alexa, Siri, and Google Now. These virtual personal assistants help in finding information, whenever asked over voice. We can activate these virtual personal assistants and ask questions like "Which are the flights from London to Germany?”, “What are the tasks that need to be performed today?" For answering such queries, virtual personal assistants collect information or search previously asked queries or collect information from phone apps. Machine learning is an integral part of virtual personal assistants as they collect information and refine it based on the previous information which is then used to generate results based on the given preferences. Virtual personal assistants are integrated to various platforms such as mobile apps (for example, Google Allo), smartphones (for example, Samsung Bixby on Samsung S8), smart speakers (for example, Amazon Echo, Google Home), etc. Virtual personal assistants are small, portable devices. Google Home is shown in Figure 1.3.
    Figure 1.3: Google Home
  • Traffic prediction: In order to manage traffic, GPS navigation devices are used. GPS devices track the current location and velocity of a vehicle, and store the information in the central server. This information is used to generate the current traffic report. This prevents traffic and helps in congestion analysis. A GPS device equipped in a car is shown in Figure 1.4. So, machine learning is used for estimating the areas where congestion can be found on the basis of daily GPS reports.
    Figure 1.4: A GPS device equipped in a car
  • Online transportation networks: When we book a cab using an app, it estimates the price of the rid...

Table of contents