Hands-On Exploratory Data Analysis with Python
Perform EDA techniques to understand, summarize, and investigate your data
Suresh Kumar Mukhiya, Usman Ahmed
- 352 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
Hands-On Exploratory Data Analysis with Python
Perform EDA techniques to understand, summarize, and investigate your data
Suresh Kumar Mukhiya, Usman Ahmed
About This Book
Discover techniques to summarize the characteristics of your data using PyPlot, NumPy, SciPy, and pandas
Key Features
- Understand the fundamental concepts of exploratory data analysis using Python
- Find missing values in your data and identify the correlation between different variables
- Practice graphical exploratory analysis techniques using Matplotlib and the Seaborn Python package
Book Description
Exploratory Data Analysis (EDA) is an approach to data analysis that involves the application of diverse techniques to gain insights into a dataset. This book will help you gain practical knowledge of the main pillars of EDA - data cleaning, data preparation, data exploration, and data visualization.
You'll start by performing EDA using open source datasets and perform simple to advanced analyses to turn data into meaningful insights. You'll then learn various descriptive statistical techniques to describe the basic characteristics of data and progress to performing EDA on time-series data. As you advance, you'll learn how to implement EDA techniques for model development and evaluation and build predictive models to visualize results. Using Python for data analysis, you'll work with real-world datasets, understand data, summarize its characteristics, and visualize it for business intelligence.
By the end of this EDA book, you'll have developed the skills required to carry out a preliminary investigation on any dataset, yield insights into data, present your results with visual aids, and build a model that correctly predicts future outcomes.
What you will learn
- Import, clean, and explore data to perform preliminary analysis using powerful Python packages
- Identify and transform erroneous data using different data wrangling techniques
- Explore the use of multiple regression to describe non-linear relationships
- Discover hypothesis testing and explore techniques of time-series analysis
- Understand and interpret results obtained from graphical analysis
- Build, train, and optimize predictive models to estimate results
- Perform complex EDA techniques on open source datasets
Who this book is for
This EDA book is for anyone interested in data analysis, especially students, statisticians, data analysts, and data scientists. The practical concepts presented in this book can be applied in various disciplines to enhance decision-making processes with data analysis and synthesis. Fundamental knowledge of Python programming and statistical concepts is all you need to get started with this book.
Frequently asked questions
Information
Section 1: The Fundamentals of EDA
- Chapter 1, Exploratory Data Analysis Fundamentals
- Chapter 2, Visual Aids for EDA
- Chapter 3, EDA with Personal Email
- Chapter 4, Data Transformation
Exploratory Data Analysis Fundamentals
- Understanding data science
- The significance of EDA
- Making sense of data
- Comparing EDA with classical and Bayesian analysis
- Software tools available for EDA
- Getting started with EDA
Understanding data science
- Data requirements: There can be various sources of data for an organization. It is important to comprehend what type of data is required for the organization to be collected, curated, and stored. For example, an application tracking the sleeping pattern of patients suffering from dementia requires several types of sensors' data storage, such as sleep data, heart rate from the patient, electro-dermal activities, and user activities pattern. All of these data points are required to correctly diagnose the mental state of the person. Hence, these are mandatory requirements for the application. In addition to this, it is required to categorize the data, numerical or categorical, and the format of storage and dissemination.
- Data collection: Data collected from several sources must be stored in the correct format and transferred to the right information technology personnel within a company. As mentioned previously, data can be collected from several objects on several events using different types of sensors and storage tools.
- Data processing: Preprocessing involves the process of pre-curating the dataset before actual analysis. Common tasks involve correctly exporting the dataset, placing them under the right tables, structuring them, and exporting them in the correct format.
- Data cleaning: Preprocessed data is still not ready for detailed analysis. It must be correctly transformed for an incompleteness check, duplicates check, error check, and missing value check. These tasks are performed in the data cleaning stage, which involves responsibilities such as matching the correct record, finding inaccuracies in the dataset, understanding the overall data quality, removing duplicate items, and filling in the missing values. However, how could we identify these anomalies on any dataset? Finding such data issues requires us to perform some analytical techniques. We will be learning several such analytical techniques in Chapter 4, Data Transformation. To understand briefly, data cleaning is dependent on the types of data under study. Hence, it is most essential for data scientists or EDA experts to comprehend different types of datasets. An example of data cleaning would be using outlier detection methods for quantitative data cleaning.
- EDA: Exploratory data analysis, as mentioned before, is the stage where we actually star...