
Machine Learning Fundamentals
Use Python and scikit-learn to get up and running with the hottest developments in machine learning
- 240 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
Machine Learning Fundamentals
Use Python and scikit-learn to get up and running with the hottest developments in machine learning
About this book
With the flexibility and features of scikit-learn and Python, build machine learning algorithms that optimize the programming process and take application performance to a whole new level
Key Features
- Explore scikit-learn uniform API and its application into any type of model
- Understand the difference between supervised and unsupervised models
- Learn the usage of machine learning through real-world examples
Book Description
As machine learning algorithms become popular, new tools that optimize these algorithms are also developed. Machine Learning Fundamentals explains you how to use the syntax of scikit-learn. You'll study the difference between supervised and unsupervised models, as well as the importance of choosing the appropriate algorithm for each dataset. You'll apply unsupervised clustering algorithms over real-world datasets, to discover patterns and profiles, and explore the process to solve an unsupervised machine learning problem.
The focus of the book then shifts to supervised learning algorithms. You'll learn to implement different supervised algorithms and develop neural network structures using the scikit-learn package. You'll also learn how to perform coherent result analysis to improve the performance of the algorithm by tuning hyperparameters.
By the end of this book, you will have gain all the skills required to start programming machine learning algorithms.
What you will learn
- Understand the importance of data representation
- Gain insights into the differences between supervised and unsupervised models
- Explore data using the Matplotlib library
- Study popular algorithms, such as k-means, Mean-Shift, and DBSCAN
- Measure model performance through different metrics
- Implement a confusion matrix using scikit-learn
- Study popular algorithms, such as Naรฏve-Bayes, Decision Tree, and SVM
- Perform error analysis to improve the performance of the model
- Learn to build a comprehensive machine learning program
Who this book is for
Machine Learning Fundamentals is designed for developers who are new to the field of machine learning and want to learn how to use the scikit-learn library to develop machine learning algorithms. You must have some knowledge and experience in Python programming, but you do not need any prior knowledge of scikit-learn or machine learning algorithms.
Frequently asked questions
- Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
- Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Information
Appendix
About
Chapter 1: Introduction to scikit-learn
Activity 1: Selecting a Target Feature and Creating a Target Matrix
- Load the titanic dataset using the seaborn library. First, import the seaborn library, and then use the load_dataset("titanic") function:import seaborn as snstitanic = sns.load_dataset('titanic')titanic.head(10)Next, print out the top 10 instances; this should match the below screenshot:

Figure 1.23: An image showing the first 10 instances of the Titanic dataset
- The preferred target feature could be either survived or alive. This is mainly because both of them label whether a person survived the crash. For the following steps, the variable chosen is survived. However, choosing alive will not affect the final shape of the variables.
- Create a variable, X, to store the features, by using drop(). As explained previously, the selected target feature is survived, which is why it is dropped from the features matrix.Create a variable, Y, to store the target matrix. Use indexing to access only the value from the column survived:X = titanic.drop('survived',axis = 1)Y = titanic['survived']
- Print out the shape of variable X, as follows:X.shape(891, 14)Do the same for variable Y:Y.shape(891,)
Activity 2: Preprocessing an Entire Dataset
- Load the dataset and create the features and target matrices:import seaborn as snstitanic = sns.load_dataset('titanic')X = titanic[['sex','age','fare','class','embark_town','alone']]Y = titanic['survived']X.shape(891, 6)
- Check for missing values in all features. As we did previously, use isnull() to determine whether a value is missing, and use sum() to sum up the occurrences of missing values along each feature:print("Sex: " + str(X['sex'].isnull().sum()))print("Age: " + str(X['age'].isnull().sum()))print("Fare: " + str(X['fare'].isnull().sum()))print("Class: " + str(X['class'].isnull().sum()))print("Embark town: " + str(X['embark_town'].isnull().sum()))print("Alone: " + str(X['alone'].isnull().sum()))The output will look as follows:Sex: 0Age: 177Fare: 0Class: 0Embark town: 2Alone: 0As you can see from the preceding screenshot, only two features contain missing values: age and embark_town.
- As age has many missing values that accounts for almost 20% of the total, the values should be replaced. Mean imputation methodology will be applied, as shown in the following code:#Age: missing valuesmean = X['age'].mean()mean = mean.round()X['age'].fillna(mean,inplace = True)

Figure 1.24: A screenshot displaying the output of the preceding code
After calculating the mean, the missing values are replaced by it using the fillna() function.Note
The preceding warning may appear as the values are being replaced over a slice of the DataFrame. This happens because the variable X is created as a slice of the entire DataFrame titanic. As X is the variable that matters for the current exercise, it is not an issue to only replace the values over the slice and not over the entire DataFrame. - Given that the number of missing values in the embark_town feature is low, the instances are eliminated from the features matrix:
Note
To eliminate the missing values from the embark_town feature, it is required to eliminate the entire instance (observation) from the matrix.# Embark_town: missing valuesX = X[X['embark_town'].notnull()]X.shape(889, 6)The notnull() function detects all non-missing values over the object in question. In this case, the function is used to obtain all non-missing values from the embark_town feature. Then, indexing is used to retrieve those values from the entire matrix (X). - Discover the outliers present in the numeric features. Let's use three standard deviations as the measure to calculate the min and max threshold for numeric features. Using the formula that we have learned, the min and max threshold are calculated and compared against the min and max values of the feature:feature = "age"print("Min threshold: " + str(X[feature].mean() - (3 * X[feature].std()))," Min val: " + str(X[feature].min()))print("Max threshold: " + str(X[feature].mean() + (3 * X[feature].std()))," Max val: " + str(X[feature].max()))The values obtained for the above code are shown here:M...
Table of contents
- Preface
- Introduction to Scikit-Learn
- Unsupervised Learning: Real-Life Applications
- Supervised Learning: Key Steps
- Supervised Learning Algorithms: Predict Annual Income
- Artificial Neural Networks: Predict Annual Income
- Building Your Own Program
- Appendix