eBook - ePub

Machine Learning Cookbook with Python

Name: Machine Learning Cookbook with Python
Author: Rehan Guha

Create ML and Data Analytics Projects Using Some Amazing Open Datasets (English Edition)

Rehan Guha,

English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Machine Learning Cookbook with Python

Create ML and Data Analytics Projects Using Some Amazing Open Datasets (English Edition)

Rehan Guha,

About this book

A Cookbook that will help you implement Machine Learning algorithms and techniques bybuilding real-world projects Key Features

Learn how to handle an entire Machine Learning Pipeline supported with adequate mathematics.
Create Predictive Models and choose the right model for various types of Datasets.
Learn the art of tuning a model to improve accuracy as per Business requirements.
Get familiar with concepts related to Data Analytics with Visualization, Data Science and Machine Learning.
Description
Machine Learning does not have to be intimidating at all. This book focuses on the concepts of Machine Learning and Data Analytics with mathematical explanations and programming examples. All the codes are written in Python as it is one of the most popular programming languages used for Data Science and Machine Learning. Here I have leveraged multiple libraries like NumPy, Pandas, scikit-learn, etc. to ease our task and not reinvent the wheel. There are five projects in total, each addressing a unique problem. With the recipes in this cookbook, one will learn how to solve Machine Learning problems for real-time data and perform Data Analysis and Analytics, Classification, and beyond. The datasets used are also unique and will help one to think, understand the problem and proceed towards the goal. The book is not saturated with Mathematics, but mostly all the Mathematical concepts are covered for the important topics. Every chapter typically starts with some theory and prerequisites, and then it gradually dives into the implementation of the same concept using Python, keeping a project in the background. What will you learn
Understand the working of the O.S.E.M.N. framework in Data Science.
Get familiar with the end-to-end implementation of Machine Learning Pipeline.
Learn how to implement Machine Learning algorithms and concepts using Python.
Learn how to build a Predictive Model for a Business case.
Who this book is for
This cookbook is meant for anybody who is passionate enough to get into the World of Machine Learning and has a preliminary understanding of the Basics of Linear Algebra, Calculus, Probability, and Statistics. This book also serves as a reference guidebook for intermediate Machine Learning practitioners. Table of Contents
1. Boston Crime
2. World Happiness Report
3. Iris Species
4. Credit Card Fraud Detection
5. Heart Disease UCI About the Author
Rehan Guha —A Researcher by the day and an Artist by night.Our Author is a Scholar -lecturer, an Innovator, and also a Humanitarian -Philanthropist.He started his life as a Coder, Developer, and now he is into research in the field of Machine Learning and Algorithms but also has a keen interest in General Science, Technology, Invention & Innovation. The author holds a graduation degree from the Institute of Engineering & Management, Kolkata, and a Postgraduate certification on Deep Learning from the Indian Institute of Technology, Kharagpur (IIT-K)-AICTE approved FDP course. If we talk about Rehan's area of interest, it lies in Optimization Problems, Explainable AI, Deep Learning Architecture, Algorithms, Complexity, Algorithmic Thinking, et cetera… He has multiple publications through Journals and Open Publications, along with his publications he has filed multiple patents for his Innovations and Inventions. At an early age, one of his patents was also demonstrated to the Indian Army. In Rehan's career, he has been involved with a variety of Business Verticals, starting from Banking, Consulting, Law, Insurance, Freight & Logistics, and Telcom.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.

Yes, you can access Machine Learning Cookbook with Python by Rehan Guha in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Mining. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Year

eBook ISBN

Topic

Subtopic

Index

CHAPTER 1 Boston Crime

Introduction

Everyone has heard that “Data¹ is the new oil,” and data is freely available everywhere, starting from newspaper, Twitter, etc. Just the crude oil, i.e., data, has no value by itself, so we will be using different techniques to make sense of the data and gain some information² out of it.

Let us start with the basics like O.S.E.M.N. Framework, E.D.A., and some visualization techniques. This chapter will mostly cover some basics of Data Exploration and how to implement some techniques for data cleaning as well.

This chapter is extremely important as it is the introduction to Machine Learning and Data Science. The readers will get the skill and the confidence to play with the data and draw various insights out of it.

Structure

Types of data
Let’s talk about the Boston dataset
O.S.E.M.N. framework

Objective

In this chapter, we will mostly look into the concept of data analysis and see some techniques to clean the data. At the end of the chapter, the reader should have the ability to process the data and have great insight into the data we are using.

What is Data?

As per the definition of data, it is the “facts and statistics collected together for reference or analysis.” But we will define Data as a set of values of subjects concerning qualitative or quantitative variables. Data and information or knowledge are often used interchangeably; however, data becomes information when it is viewed in context or post analysis³.

Types of Data

Structured data - Structured data is generally stored in tabular form, and it can be stored in a relational database. It can be names, phone numbers, location, or other metrics like distance, loan amount, etc. and generally, we can query the relational table with SQL.
Semi-structured data - Semi-structured data is similar to structured data, but it does not follow the conventional relational table structure. Files like XML, JSON, etc. are examples of semi-structured data.
Unstructured data - As the name suggests, unstructured data follows no formal structure or relational table. E.g., texts, tweets from Twitter, Media (Audio-Video, etc.)

These are some of the building blocks of data types⁴ which are used in machine learning.

For this chapter, we will use structured data taken from the Boston Police Department.

Let’s talk about the Boston dataset

Boston crime dataset⁵ is a collection of crime incident reports that are provided by the Boston Police Department (BPD). This dataset documents the initial details surrounding an incident to which BPD officers respond. This is a dataset containing records from the new crime incident report system, which includes a reduced set of fields focused on capturing the type of incident as well as when and where it occurred. Records in the new system begin from June of 2015.

So, the first thing we should do is to know the different features in the dataset.

Data Dictionary

Any standard dataset will contain a data dictionary. As per definition, a data dictionary is “a set of information describing the contents, format, and structure of a database and the relationship between its elements, used to control access to and manipulation of the database.” As mentioned before, we are using structured data, so this can be stored in a relational database. Table 1.1 shows the data dictionary with all the details.

Field Name, Data Type, Required	Description
[incident_num] [varchar] (20) NOT NULL,	Internal BPD report number
[offense_code] [varchar] (25) NULL,	Numerical code of offense description
[Offense_Code_Group_Description] [varchar] (80) NULL,	Internal categorization of [offense_description]
[Offense_Description] [varchar] (80) NULL,	The primary descriptor of the incident
[district] [varchar] (10) NULL,	What district the crime was reported in
[reporting_area] [varchar](10) NULL,	RA number associated with the location where the crime was reported from.
[shooting][char] (1) NULL,	Indicated, a shooting took place.
[occurred_on] [datetime2](7) NULL,	Earliest date and time the incident could have taken place
[UCR_Part] [varchar](25) NULL,	Universal Crime Reporting Part number (1,2, 3)
[street] [varchar](50) NULL,	Street name the incident took place

Table 1.1: Data Dictionary

For any dataset, we need to know about the data and analyze its each each feature and its contribution.

The best way to know more about the data is to get your hands dirty.

Let’s start with Python and Jupyter Notebook to explore the dataset.

First, we need to set up the machine and install the required packages to get things going. Please refer to the GitHub link:

https://github.com/bpbpublications/Machine-Learning-Cookbook-with-Python.

The entire code can be compiled and executed online, just with a web browser using Binder. Please use the above GitHub link to find more details about it.

O.S.E.M.N. framework

All Machine Learning Projects and Data Science Projects have a basic framework named O.S.E.M.N. (Obtaining, Scrubbing, Exploring, Modeling, INterpreting)⁶, and we can see with framework Data Fetching, Data Cleaning, and Data Exploring takes up 60% of the pipeline.

What is Data Obtaining?

In this chapter, we are using data from the Boston Police Department repository, and downloading from it is considered as Data Obtaining/Fetching. There can be cases where we need to scrap⁷ the data from website, media files, log files, etc… All the steps required to gather the data are considered Data Obtaining/Fetching.

After downloading the Data from the given URL, we need to load the dataset using pandas and store it in the DataFrame (Figure 1.1) to start cleaning the data for Exploring.

Figure 1.1: Data loading

We are using a DataFrame from pandas to store the Boston Crime Dataset. pandas is a popular library used for Machine Learning and Data Science.

What is Data Scrubbing?

As the name suggests, Data Scrubbing⁸ is a process of cleaning the data which will be fit for use in the nex...

Cover Page
Title Page
Copyright Page
Dedication Page
About the Author
Acknowledgement
Preface
Errata
Table of Contents
1. Boston Crime
2. World Happiness Report
3. Iris Species
4. Credit Card Fraud Detection
5. Heart Disease UCI