eBook - ePub

Hands-On Data Preprocessing in Python

Name: Hands-On Data Preprocessing in Python
ISBN: 9781801079952

Roy Jafari,

602 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Hands-On Data Preprocessing in Python

Roy Jafari,

About this book

Get your raw data cleaned up and ready for processing to design better data analytic solutionsKey Features• Develop the skills to perform data cleaning, data integration, data reduction, and data transformation• Make the most of your raw data with powerful data transformation and massaging techniques• Perform thorough data cleaning, including dealing with missing values and outliersBook DescriptionHands-On Data Preprocessing is a primer on the best data cleaning and preprocessing techniques, written by an expert who's developed college-level courses on data preprocessing and related subjects. With this book, you'll be equipped with the optimum data preprocessing techniques from multiple perspectives, ensuring that you get the best possible insights from your data. You'll learn about different technical and analytical aspects of data preprocessing – data collection, data cleaning, data integration, data reduction, and data transformation – and get to grips with implementing them using the open source Python programming environment. The hands-on examples and easy-to-follow chapters will help you gain a comprehensive articulation of data preprocessing, its whys and hows, and identify opportunities where data analytics could lead to more effective decision making. As you progress through the chapters, you'll also understand the role of data management systems and technologies for effective analytics and how to use APIs to pull data. By the end of this Python data preprocessing book, you'll be able to use Python to read, manipulate, and analyze data; perform data cleaning, integration, reduction, and transformation techniques, and handle outliers or missing values to effectively prepare data for analytic tools.What you will learn• Use Python to perform analytics functions on your data• Understand the role of databases and how to effectively pull data from databases• Perform data preprocessing steps defined by your analytics goals• Recognize and resolve data integration challenges• Identify the need for data reduction and execute it• Detect opportunities to improve analytics with data transformationWho this book is forThis book is for junior and senior data analysts, business intelligence professionals, engineering undergraduates, and data enthusiasts looking to perform preprocessing and data cleaning on large amounts of data. You don't need any prior experience with data preprocessing to get started with this book. However, basic programming skills, such as working with variables, conditionals, and loops, along with beginner-level knowledge of Python and simple analytics experience, are a prerequisite.

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Publisher

Packt Publishing

Year

2022

Edition

eBook ISBN

9781801079952

Topic

Computer Science

Subtopic

Data Processing

Index

Computer Science

Part 1:Technical Needs

After reading this part of the book, you will be able to use Python to effectively manipulate data.

This part comprises the following chapters:

Chapter 1, Review of the Core Modules of NumPy and Pandas
Chapter 2, Review of Another Core Module – Matplotlib
Chapter 3, Data – What Is It Really?
Chapter 4, Databases

Chapter 1: Review of the Core Modules of NumPy and Pandas

NumPy and Pandas modules are capable of meeting your needs for the majority of data analytics and data preprocessing tasks. Before we start reviewing these two valuable modules, I would like to let you know that this chapter is not meant to be a comprehensive teaching guide to these modules, but rather a collection of concepts, functions, and examples that will be invaluable, as we will cover data analytics and data preprocessing in proceeding chapters.

In this chapter, we will first review the Jupyter Notebooks and their capability as an excellent coding User Interface (UI). Next, we will review the most relevant data analytic resources of the NumPy and Pandas Python modules.

The following topics will be covered in this chapter:

Overview of the Jupyter Notebook
Are we analyzing data via computer programming?
Overview of the basic functions of NumPy
Overview of Pandas

Technical requirements

The easiest way to get started with Python programming is by installing Anaconda Navigator. It is open source software that brings together many useful open source tools for developers. You can download Anaconda Navigator by following this link: https://www.anaconda.com/products/individual.

We will be using Jupyter Notebook throughout this book. Jupyter Notebook is one of the open source tools that Anaconda Navigator provides. Anaconda Navigator also installs a Python version on your computer. So, following Anaconda Navigator's easy installation, all you need to do is open Anaconda Navigator and then select Jupyter Notebook.

You will be able to find all of the code and the dataset that is used in this book in a GitHub repository exclusively created for this book. To find the repository, click on the following link: https://github.com/PacktPublishing/Hands-On-Data-Preprocessing-in-Python. Each chapter in this book will have a folder that contains all of the code and datasets that were used in the chapter.

Overview of the Jupyter Notebook

The Jupyter Notebook is becoming increasingly popular as a successful User Interface (UI) for Python programing. As a UI, the Jupyter Notebook provides an interactive environment where you can run your Python code, see immediate outputs, and take notes.

Fernando Pérezthe and Brian Granger, the architects of the Jupyter Notebook, outlines the following reasons in terms of what they were looking for in an innovative programming UI:

Space for individual exploratory work
Space for collaboration
Space for learning and education

If you have used the Jupyter Notebook already, you can attest that it delivers all these promises, and if you have not yet used it, I have good news for you: we will be using Jupyter Notebook for the entirety of this book. Some of the code that I will be sharing will be in the form of screenshots from the Jupyter Notebook UI.

The UI design of the Jupyter Notebook is very simple. You can think of it as one column of material. These materials could be under code chunks or Markdown chunks. The solution development and the actual coding happens under the code chunks, whereas notes for yourself or other developers are presented under Markdown chunks. The following screenshot shows both an example of a Markdown chunk and a code chunk. You can see that the code chunk has been executed and the requested print has taken place and the output is shown immediately after the code chunk:

Figure 1.1 – Code for printing Hello World in a Jupyter notebook

To create a new chunk, you can click on the + sign on the top ribbon of the UI. The newly added chunk will be a code chunk by default. You can switch the code chunk to a Markdown chunk by using the drop-down list on the top ribbon. Moreover, you can move the chunks up or down by using the correct arrows on the ribbon. You can see these three buttons in the following screenshot:

Figure 1.2 – Jupyter Note...

Hands-On Data Preprocessing in Python
Contributors
Preface
Part 1:Technical Needs
Chapter 1: Review of the Core Modules of NumPy and Pandas
Chapter 2: Review of Another Core Module – Matplotlib
Chapter 3: Data – What Is It Really?
Chapter 4: Databases
Part 2: Analytic Goals
Chapter 5: Data Visualization
Chapter 6: Prediction
Chapter 7: Classification
Chapter 8: Clustering Analysis
Part 3: The Preprocessing
Chapter 9: Data Cleaning Level I – Cleaning Up the Table
Chapter 10: Data Cleaning Level II – Unpacking, Restructuring, and Reformulating the Table
Chapter 11: Data Cleaning Level III – Missing Values, Outliers, and Errors
Chapter 12: Data Fusion and Data Integration
Chapter 13: Data Reduction
Chapter 14: Data Transformation and Massaging
Part 4: Case Studies
Chapter 15: Case Study 1 – Mental Health in Tech
Chapter 16: Case Study 2 – Predicting COVID-19 Hospitalizations
Chapter 17: Case Study 3: United States Counties Clustering Analysis
Chapter 18: Summary, Practice Case Studies, and Conclusions
Other Books You May Enjoy

Frequently asked questions

Can I cancel at any time?

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

Can I download books?

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

What is the difference between the pricing plans?

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

How does Perlego work?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Can I read on my tablet or smartphone?

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Is Hands-On Data Preprocessing in Python an online PDF/ePUB?

Yes, you can access Hands-On Data Preprocessing in Python by Roy Jafari in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Processing. We have over 1.5 million books available in our catalogue for you to explore.

Hands-On Data Preprocessing in Python

Hands-On Data Preprocessing in Python

About this book

Trusted by 375,005 students

Information

Part 1:Technical Needs

Chapter 1: Review of the Core Modules of NumPy and Pandas

Technical requirements

Overview of the Jupyter Notebook

Table of contents

Frequently asked questions