eBook - ePub

Machine Learning for Healthcare Analytics Projects

Name: Machine Learning for Healthcare Analytics Projects
Author: Eduonix Learning Solutions

Build smart AI applications using neural network methodologies across the healthcare vertical market

Eduonix Learning Solutions

134 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Machine Learning for Healthcare Analytics Projects

Build smart AI applications using neural network methodologies across the healthcare vertical market

Eduonix Learning Solutions

Book details

Book preview

Table of contents

Citations

About This Book

Create real-world machine learning solutions using NumPy, pandas, matplotlib, and scikit-learn

Key Features

Develop a range of healthcare analytics projects using real-world datasets
Implement key machine learning algorithms using a range of libraries from the Python ecosystem
Accomplish intermediate-to-complex tasks by building smart AI applications using neural network methodologies

Book Description

Machine Learning (ML) has changed the way organizations and individuals use data to improve the efficiency of a system. ML algorithms allow strategists to deal with a variety of structured, unstructured, and semi-structured data. Machine Learning for Healthcare Analytics Projects is packed with new approaches and methodologies for creating powerful solutions for healthcare analytics.

This book will teach you how to implement key machine learning algorithms and walk you through their use cases by employing a range of libraries from the Python ecosystem. You will build five end-to-end projects to evaluate the efficiency of Artificial Intelligence (AI) applications for carrying out simple-to-complex healthcare analytics tasks. With each project, you will gain new insights, which will then help you handle healthcare data efficiently. As you make your way through the book, you will use ML to detect cancer in a set of patients using support vector machines (SVMs) and k-Nearest neighbors (KNN) models. In the final chapters, you will create a deep neural network in Keras to predict the onset of diabetes in a huge dataset of patients. You will also learn how to predict heart diseases using neural networks.

By the end of this book, you will have learned how to address long-standing challenges, provide specialized solutions for how to deal with them, and carry out a range of cognitive tasks in the healthcare domain.

What you will learn

Explore super imaging and natural language processing (NLP) to classify DNA sequencing
Detect cancer based on the cell information provided to the SVM
Apply supervised learning techniques to diagnose autism spectrum disorder (ASD)
Implement a deep learning grid and deep neural networks for detecting diabetes
Analyze data from blood pressure, heart rate, and cholesterol level tests using neural networks
Use ML algorithms to detect autistic disorders

Who this book is for

Machine Learning for Healthcare Analytics Projects is for data scientists, machine learning engineers, and healthcare professionals who want to implement machine learning algorithms to build smart AI applications. Basic knowledge of Python or any programming language is expected to get the most from this book.

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is Machine Learning for Healthcare Analytics Projects an online PDF/ePUB?

Yes, you can access Machine Learning for Healthcare Analytics Projects by Eduonix Learning Solutions in PDF and/or ePUB format, as well as other popular books in Computer Science & Natural Language Processing. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Packt Publishing

Year

2018

ISBN

9781789532524

Edition

Topic

Computer Science

Subtopic

Natural Language Processing

Index

Computer Science

Diabetes Onset Detection

The far-ranging developments in healthcare over the past few years have led to a huge collection of data that can be used for analysis. We can now easily predict the onset of various illnesses before they even happen, using a technology called neural networks. In this chapter, we are going to use a deep neural network and a grid search to predict the onset of diabetes for a set of patients. We will learn a lot about deep neural networks, the parameters that are used to optimize them, and how to choose the correct parameters for each.

We will cover the following topics in this chapter:

Detecting diabetes using a deep learning grid search
Introduction to the dataset
Building a Keras model
Performing a grid search using scikit-learn
Reducing overfitting using dropout regularization
Finding the optimal hyperparameters
Generating predictions using optimal hyperparameters

Detecting diabetes using a grid search

We will be predicting diabetes on a of patients by using a deep learning algorithm, which we will optimize with a grid search to find the optimal hyperparameters. We are going to be doing this project in Jupyter Notebook, as follows:

Start by opening up Command Prompt in Windows or Terminal in Linux systems. We will navigate to our project directory using the cd command.
Our next step is to open the Jupyter Notebook by typing the following command:

jupyter notebook

Alternatively, you can use the jupyter lab command to open an instance of Jupyter Lab, which is just a better version of Notebook.

Once the Notebook is open, we will rename the unnamed file to Deep Learning Grid Search.
We will then import our packages using general import statements. We will print the version numbers, as shown in the following screenshot:

Keras has two options: TensorFlow and Theano. These are both deep learning packages, but we will be using Theano in this chapter. To switch from TensorFlow to Theano, perform the following steps:

Go to the .keras folder that is present in the Windows Users folder. We can navigate to this folder using C:|Users|<yourusername>|.keras. This folder contains a datasets folder and keras.json file, as shown in the following screenshot:

If you open up the keras.json file in Notepad, you'll see the following details:

In the preceding screenshot, we can see that Keras is currently using the TensorFlow backend.

Since we will be using Theano, change the backend variable to theano. We are now all set to continue.

If you were using TensorFlow previously, you might have to install Theano first.

We will now change the naming convention for pandas and numpy, so that we can use their abbreviated terms in the future. This can be done using the following lines of code:

import pandas as pd
import numpy as np

Introduction to the dataset

Our next step is to import the Pima Indians diabetes dataset, which contains the details of about 750 patients:

The dataset that we need can be found at https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv. We can import it by using the following line:

url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"

If we navigate to the preceding URL, we can see a lot of raw information. Once we have imported the dataset, we have to define column names. We will do this using the following lines of code:

names = ['n_pregnant', 'glucose_concentration', 'blood_pressure (mm Hg)', 'skin_thickness (mm)', 'serum_insulin (mu U/ml)', 'BMI', 'pedigree_function', 'age', 'class']

As can be seen in the preceding code block, we have several parameters, including blood_pressure, age, and BMI.

Once we have defined the names of the columns, we have to read all the data into a pandas DataFrame. Since our dataset is in CSV format, we can use the pd.read_csv() function to do this, as shown in the following screenshot:

We will now have a look at the dataset by using the describe() function, as shown in the following screenshot:

As shown in the preceding screenshot, we have 8 columns and 768 instances for each column. This DataFrame gives us various measures for each column, including mean, min, std, and max. n_pregnant, for example, goes all the way from 0 to somebody who was pregnant 17 times, which is the maximum value. However, in most of the columns, we notice that there are quite a few places where the value is zero, which may represent missing data.

Having missing data will throw off our algorithm's accuracy. Let's deal with this first.

Preprocessing the dataset

Since we have missing data values, we will have to sort through the data to understand what's going on:

To do this, we will use the following code snippet to pull up a DataFrame where the glucose concentration of a patient is listed as 0:

df[df['glucose_concentration'] == 0]

This provides us with a DataFrame, as seen in the following screenshot:

Here, we can see that there are five cases where the glucose_concentration is 0, meaning that it is likely that there is some missing information in the dataset. This will hinder the accuracy of our algorithm, so we have to preprocess the data.

We're going to mark the missing values as NaN, and drop them. To do this, we're going to define the columns we want to look at. We will define all of the columns, excluding those for n_pregnancy, age, and class. This can be done as follows:

columns = ['glucose_concentration', 'blood_pressure (mm Hg)', 'skin_thickness (mm)', 'serum_insulin (mu U/ml)', 'BMI']

After defining the columns, we have to replace all the zero values with NaN. This can be done as follows:

for col in columns:
 df[col].replace(0, np.NaN, inplace=True)

We will then take another look at the DataFrame to ensure that the preceding commands have worked. We can do this with the describe() f...

Citation styles for Machine Learning for Healthcare Analytics Projects

APA 6 Citation

Solutions, E. L. (2018). Machine Learning for Healthcare Analytics Projects (1st ed.). Packt Publishing. Retrieved from https://www.perlego.com/book/835439/machine-learning-for-healthcare-analytics-projects-build-smart-ai-applications-using-neural-network-methodologies-across-the-healthcare-vertical-market-pdf (Original work published 2018)

Chicago Citation

Solutions, Eduonix Learning. (2018) 2018. Machine Learning for Healthcare Analytics Projects. 1st ed. Packt Publishing. https://www.perlego.com/book/835439/machine-learning-for-healthcare-analytics-projects-build-smart-ai-applications-using-neural-network-methodologies-across-the-healthcare-vertical-market-pdf.

Harvard Citation

Solutions, E. L. (2018) Machine Learning for Healthcare Analytics Projects. 1st edn. Packt Publishing. Available at: https://www.perlego.com/book/835439/machine-learning-for-healthcare-analytics-projects-build-smart-ai-applications-using-neural-network-methodologies-across-the-healthcare-vertical-market-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Solutions, Eduonix Learning. Machine Learning for Healthcare Analytics Projects. 1st ed. Packt Publishing, 2018. Web. 14 Oct. 2022.