Machine Learning Fundamentals
eBook - ePub

Machine Learning Fundamentals

Use Python and scikit-learn to get up and running with the hottest developments in machine learning

  1. 240 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Machine Learning Fundamentals

Use Python and scikit-learn to get up and running with the hottest developments in machine learning

About this book

With the flexibility and features of scikit-learn and Python, build machine learning algorithms that optimize the programming process and take application performance to a whole new level

Key Features

  • Explore scikit-learn uniform API and its application into any type of model
  • Understand the difference between supervised and unsupervised models
  • Learn the usage of machine learning through real-world examples

Book Description

As machine learning algorithms become popular, new tools that optimize these algorithms are also developed. Machine Learning Fundamentals explains you how to use the syntax of scikit-learn. You'll study the difference between supervised and unsupervised models, as well as the importance of choosing the appropriate algorithm for each dataset. You'll apply unsupervised clustering algorithms over real-world datasets, to discover patterns and profiles, and explore the process to solve an unsupervised machine learning problem.

The focus of the book then shifts to supervised learning algorithms. You'll learn to implement different supervised algorithms and develop neural network structures using the scikit-learn package. You'll also learn how to perform coherent result analysis to improve the performance of the algorithm by tuning hyperparameters.

By the end of this book, you will have gain all the skills required to start programming machine learning algorithms.

What you will learn

  • Understand the importance of data representation
  • Gain insights into the differences between supervised and unsupervised models
  • Explore data using the Matplotlib library
  • Study popular algorithms, such as k-means, Mean-Shift, and DBSCAN
  • Measure model performance through different metrics
  • Implement a confusion matrix using scikit-learn
  • Study popular algorithms, such as Naรฏve-Bayes, Decision Tree, and SVM
  • Perform error analysis to improve the performance of the model
  • Learn to build a comprehensive machine learning program

Who this book is for

Machine Learning Fundamentals is designed for developers who are new to the field of machine learning and want to learn how to use the scikit-learn library to develop machine learning algorithms. You must have some knowledge and experience in Python programming, but you do not need any prior knowledge of scikit-learn or machine learning algorithms.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weโ€™ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere โ€” even offline. Perfect for commutes or when youโ€™re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Machine Learning Fundamentals by Hyatt Saleh in PDF and/or ePUB format, as well as other popular books in Computer Science & Artificial Intelligence (AI) & Semantics. We have over one million books available in our catalogue for you to explore.

Appendix

About

This section is included to assist the students to perform the activities present in the book. It includes detailed steps that are to be performed by the students to complete and achieve the objectives of the book.

Chapter 1: Introduction to scikit-learn

Activity 1: Selecting a Target Feature and Creating a Target Matrix

  1. Load the titanic dataset using the seaborn library. First, import the seaborn library, and then use the load_dataset("titanic") function:
    import seaborn as sns
    titanic = sns.load_dataset('titanic')
    titanic.head(10)
    Next, print out the top 10 instances; this should match the below screenshot:
    Figure 1.23: An image showing the first 10 instances of the titanic dataset
    Figure 1.23: An image showing the first 10 instances of the Titanic dataset
  2. The preferred target feature could be either survived or alive. This is mainly because both of them label whether a person survived the crash. For the following steps, the variable chosen is survived. However, choosing alive will not affect the final shape of the variables.
  3. Create a variable, X, to store the features, by using drop(). As explained previously, the selected target feature is survived, which is why it is dropped from the features matrix.
    Create a variable, Y, to store the target matrix. Use indexing to access only the value from the column survived:
    X = titanic.drop('survived',axis = 1)
    Y = titanic['survived']
  4. Print out the shape of variable X, as follows:
    X.shape
    (891, 14)
    Do the same for variable Y:
    Y.shape
    (891,)

Activity 2: Preprocessing an Entire Dataset

  1. Load the dataset and create the features and target matrices:
    import seaborn as sns
    titanic = sns.load_dataset('titanic')
    X = titanic[['sex','age','fare','class','embark_town','alone']]
    Y = titanic['survived']
    X.shape
    (891, 6)
  2. Check for missing values in all features.
    As we did previously, use isnull() to determine whether a value is missing, and use sum() to sum up the occurrences of missing values along each feature:
    print("Sex: " + str(X['sex'].isnull().sum()))
    print("Age: " + str(X['age'].isnull().sum()))
    print("Fare: " + str(X['fare'].isnull().sum()))
    print("Class: " + str(X['class'].isnull().sum()))
    print("Embark town: " + str(X['embark_town'].isnull().sum()))
    print("Alone: " + str(X['alone'].isnull().sum()))
    The output will look as follows:
    Sex: 0
    Age: 177
    Fare: 0
    Class: 0
    Embark town: 2
    Alone: 0
    As you can see from the preceding screenshot, only two features contain missing values: age and embark_town.
  3. As age has many missing values that accounts for almost 20% of the total, the values should be replaced. Mean imputation methodology will be applied, as shown in the following code:
    #Age: missing values
    mean = X['age'].mean()
    mean = mean.round()
    X['age'].fillna(mean,inplace = True)
    Figure 1.24: A screenshot displaying the output of the preceding code
    Figure 1.24: A screenshot displaying the output of the preceding code
    After calculating the mean, the missing values are replaced by it using the fillna() function.

    Note

    The preceding warning may appear as the values are being replaced over a slice of the DataFrame. This happens because the variable X is created as a slice of the entire DataFrame titanic. As X is the variable that matters for the current exercise, it is not an issue to only replace the values over the slice and not over the entire DataFrame.
  4. Given that the number of missing values in the embark_town feature is low, the instances are eliminated from the features matrix:

    Note

    To eliminate the missing values from the embark_town feature, it is required to eliminate the entire instance (observation) from the matrix.
    # Embark_town: missing values
    X = X[X['embark_town'].notnull()]
    X.shape
    (889, 6)
    The notnull() function detects all non-missing values over the object in question. In this case, the function is used to obtain all non-missing values from the embark_town feature. Then, indexing is used to retrieve those values from the entire matrix (X).
  5. Discover the outliers present in the numeric features. Let's use three standard deviations as the measure to calculate the min and max threshold for numeric features. Using the formula that we have learned, the min and max threshold are calculated and compared against the min and max values of the feature:
    feature = "age"
    print("Min threshold: " + str(X[feature].mean() - (3 * X[feature].std()))," Min val: " + str(X[feature].min()))
    print("Max threshold: " + str(X[feature].mean() + (3 * X[feature].std()))," Max val: " + str(X[feature].max()))
    The values obtained for the above code are shown here:
    M...

Table of contents

  1. Preface
  2. Introduction to Scikit-Learn
  3. Unsupervised Learning: Real-Life Applications
  4. Supervised Learning: Key Steps
  5. Supervised Learning Algorithms: Predict Annual Income
  6. Artificial Neural Networks: Predict Annual Income
  7. Building Your Own Program
  8. Appendix