Building Machine Learning Systems Using Python
Practice to Train Predictive Models and Analyze Machine Learning Results with Real Use-Cases (English Edition)
Deepti Chopra
- English
- ePUB (mobile friendly)
- Available on iOS & Android
Building Machine Learning Systems Using Python
Practice to Train Predictive Models and Analyze Machine Learning Results with Real Use-Cases (English Edition)
Deepti Chopra
About This Book
Explore Machine Learning Techniques, Different Predictive Models, and its Applications
Key Features
? Extensive coverage of real examples on implementation and working of ML models.
? Includes different strategies used in Machine Learning by leading data scientists.
? Focuses on Machine Learning concepts and their evolution to algorithms.
Description
This book covers basic concepts of Machine Learning, various learning paradigms, different architectures and algorithms used in these paradigms.You will learn the power of ML models by exploring different predictive modeling techniques such as Regression, Clustering, and Classification. You will also get hands-on experience on methods and techniques such as Overfitting, Underfitting, Random Forest, Decision Trees, PCA, and Support Vector Machines. In this book real life examples with fully working of Python implementations are discussed in detail.At the end of the book you will learn about the unsupervised learning covering Hierarchical Clustering, K-means Clustering, Dimensionality Reduction, Anomaly detection, Principal Component Analysis.
What you will learn
? Learn to perform data engineering and analysis.
? Build prototype ML models and production ML models from scratch.
? Develop strong proficiency in using scikit-learn and Python.
? Get hands-on experience with Random Forest, Logistic Regression, SVM, PCA, and Neural Networks.
Who this book is for
This book is meant for beginners who want to gain knowledge about Machine Learning in detail. This book can also be used by Machine Learning users for a quick reference for fundamentals in Machine Learning. Readers should have basic knowledge of Python and Scikit-Learn before reading the book.
Table of Contents
1. Introduction to Machine Learning
2. Linear Regression
3. Classification Using Logistic Regression
4. Overfitting and Regularization
5. Feasibility of Learning
6. Support Vector Machine
7. Neural Network
8. Decision Trees
9. Unsupervised Learning
10. Theory of Generalization
11. Bias and Fairness in ML
About the Authors
Dr Deepti Chopra is working as an Assistant Professor (IT) at Lal Bahadur Shastri Institute of Management, Delhi. She has around 7 years of teaching experience. Her areas of interest include Natural Language Processing, Computational Linguistics, and Artificial Intelligence. She is the author of three books and has written several research papers in various international conferences and journals.
Frequently asked questions
Information
CHAPTER 1
Introduction
Structure
- History of machine learning
- Classification of machine learning
- Challenges faced in adopting machine learning
- Applications
Objectives
- Understanding the origin of machine learning
- Understanding the classification of machine learning algorithm
- Challenges faced in machine learning
- Applications of machine learning
History of machine learning
Classification of machine learning
- Supervised learning: In supervised learning, the learning is performed using example data and its corresponding target response. During testing, when new examples are provided, it predicts the corresponding response. This learning is similar to how a student learns from a teacher. A teacher provides some good examples for the student to memorize. The student is then able to frame general rules to solve problems and draw useful conclusions.
- Unsupervised learning: In unsupervised learning, the learning is performed using example data without its associated target response. In this type of algorithm, a restructuring of data is performed where the data is segmented into different classes. The objects that belong to the same class have a high degree of similarity.
- Reinforcement learning: Reinforcement learning is similar to unsupervised learning in which, corresponding to the example data, there is no target response and each example is accompanied by a positive or a negative feedback. A positive feedback or credit is given when, during testing, a correct response is obtained corresponding to the example data. In a negative feedback, the error or penalty is awarded because, during testing, an incorrect response is obtained corresponding to the example data.
- Semi-supervised learning: In semi-supervised learning, during training, we have example data and some of the corresponding target responses are missing. It is a combination of supervised and unsupervised learning.
- Classification: In this type of learning, two or more classes are assigned to the input present in the training data. During testing, when we provide the input, it is classified into two or more classes. For example, in spam filtering, it classifies whether an email is spam or not spam.
- Regression: Regression is performed during supervised learning. In this type of learning, the output is continuous rather than discrete.
- Clustering: Clustering is performed during unsupervised learning in which the testing data is classified into groups and, unlike the task of classification, these groups or classes are not known beforehand.
Challenges faced in adopting machine learning
- Requirement of proper experimentation and testing: We need to conduct frequent tests in a machine learning system in order to obtain the desired outcome with proper experimentation. The method used to test the machine learning algorithm is referred to as stratification. In this method, we randomly split the data set mainly into two subsets, training set and testing set.
- Inflexible business models: We should follow an agile and flexible business policy in implementing machine learning. If one of the machine learning strategies is not working, then we need to perform more experimentation and consequently build a new robust machine learning model.
- Impact of machine learning on humans: A machine learning based system such as a movie recommendation system changes the choice of human over time and narrows them with time. It is interesting to know that people don't notice how they get manipulated by algorithms. Examples include movie recommendation systems, news, propaganda, etc.
- False correlation: A false correlation comes into play when two parameters that are completely independent of each other show similar behavior. This creates an illusion that these parameters are somehow connected to each other. They are also known as spurious correlation. For example, if there is an increase in the number of car seat belts, there is a decrease in the number of astronaut deaths. This is a false correlation since a car seat belt has nothing to do with accidents occurring in space.
- Feedback loops: Feedback loops are worse than false correlations. It is a condition where the decision of an algorithm affects reality while convincing that the conclusion is correct. For example, a crime prevention program suggested that more police officials to be sent to a particular area on the basis of an increase in the crime rate. This led to the local residents reporting crimes more frequently as somebody was right there they can report them. This also led to the police officials writing more reports and implementing protocols resulting in a higher crime rate, which meant that more police had to be sent to the area. Earlier, when police officials were not present in the area, people didn't report crimes frequently.
- Poisoned or contaminated reference data: The outcome of a machine learning algorithm purely depends on the reference data or training data that a machine learns. If the training data or reference data is poisoned or contaminated, then the outcome of machine learning will also be incorrect. For example, if we want to develop a machine translation system, and if the training file consists of incorrect translations, then the output will also be incorrect.
- Trickery: Even if a machine learning algorithm is working perfectly, it can be tricked. A noise or distortion can completely alter the outcome of the algorithm. In the near future, if a machine learning algorithm is used for the analyses of X-rays emitted from the luggage at the airport and an object is placed next to a gun, then the algorithm will not be able to detect the gun.
- Mastering machine learning: A data scientist is a person who has expertise in machine learning. Those who are not data scientists may not acquire all of the knowledge related to machine learning. They need to find the key issues in a particular domain of machine learning and then try to overcome these issues. For example, a person who is working on predictive modeling may not have a complete knowledge of a Natural Language Processing (NLP) task.
- Wrong assumptions are drawn: A machine learning based system needs to deal with missing values in the data sets. For example, the missing value issue can be resolved by using the mean value as the replacement to the missing value. Here, reliable assumptions need to be drawn related to the replacement of the missing values. So, we must make sure that the data doesn't come with the missing values and assumptions drawn are of substantial amount.
- Machine learning based systems are still not intelligent: While machine learning based systems are constantly evolving, there exists failure as well in the current machine learning based systems. For example, as an experiment, Microsoft's chatbot Tay was released on Twitter that mimicked a teenage girl. It was a failure and consequently the company had to close the experiment and apologize to the whole internet crowd for the hurtful and offensive tweets by chatbot Tay.
- Computational needs are expensive: In order to perform large data processing, GPUs are used instead of CPUs. Some companies don't have GPUs, so it takes a longer time for the conventional CPUs to process large amounts of data. In some situations, even with GPUs, it may take days or weeks to complete the processing as compared to the traditional software development that may take a few minutes or hours to complete the task.
Applications
- Virtual personal assistants: Some of the most popular examples of virtual personal assistants used today include Alexa, Siri, and Google Now. These virtual personal assistants help in finding information, whenever asked over voice. We can activate these virtual personal assistants and ask questions like "Which are the flights from London to Germany?”, “What are the tasks that need to be performed today?" For answering such queries, virtual personal assistants collect information or search previously asked queries or collect information from phone apps. Machine learning is an integral part of virtual personal assistants as they collect information and refine it based on the previous information which is then used to generate results based on the given preferences. Virtual personal assistants are integrated to various platforms such as mobile apps (for example, Google Allo), smartphones (for example, Samsung Bixby on Samsung S8), smart speakers (for example, Amazon Echo, Google Home), etc. Virtual personal assistants are small, portable devices. Google Home is shown in Figure 1.3. Figure 1.3: Google Home
- Traffic prediction: In order to manage traffic, GPS navigation devices are used. GPS devices track the current location and velocity of a vehicle, and store the information in the central server. This information is used to generate the current traffic report. This prevents traffic and helps in congestion analysis. A GPS device equipped in a car is shown in Figure 1.4. So, machine learning is used for estimating the areas where congestion can be found on the basis of daily GPS reports. Figure 1.4: A GPS device equipped in a car
- Online transportation networks: When we book a cab using an app, it estimates the price of the rid...