Practical Data Analysis Cookbook
eBook - ePub

Practical Data Analysis Cookbook

  1. 384 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Practical Data Analysis Cookbook

About this book

Over 60 practical recipes on data exploration and analysis

About This Book

  • Clean dirty data, extract accurate information, and explore the relationships between variables
  • Forecast the output of an electric plant and the water flow of American rivers using pandas, NumPy, Statsmodels, and scikit-learn
  • Find and extract the most important features from your dataset using the most efficient Python libraries

Who This Book Is For

If you are a beginner or intermediate-level professional who is looking to solve your day-to-day, analytical problems with Python, this book is for you. Even with no prior programming and data analytics experience, you will be able to finish each recipe and learn while doing so.

What You Will Learn

  • Read, clean, transform, and store your data usng Pandas and OpenRefine
  • Understand your data and explore the relationships between variables using Pandas and D3.js
  • Explore a variety of techniques to classify and cluster outbound marketing campaign calls data of a bank using Pandas, mlpy, NumPy, and Statsmodels
  • Reduce the dimensionality of your dataset and extract the most important features with pandas, NumPy, and mlpy
  • Predict the output of a power plant with regression models and forecast water flow of American rivers with time series methods using pandas, NumPy, Statsmodels, and scikit-learn
  • Explore social interactions and identify fraudulent activities with graph theory concepts using NetworkX and Gephi
  • Scrape Internet web pages using urlib and BeautifulSoup and get to know natural language processing techniques to classify movies ratings using NLTK
  • Study simulation techniques in an example of a gas station with agent-based modeling

In Detail

Data analysis is the process of systematically applying statistical and logical techniques to describe and illustrate, condense and recap, and evaluate data. Its importance has been most visible in the sector of information and communication technologies. It is an employee asset in almost all economy sectors.

This book provides a rich set of independent recipes that dive into the world of data analytics and modeling using a variety of approaches, tools, and algorithms. You will learn the basics of data handling and modeling, and will build your skills gradually toward more advanced topics such as simulations, raw text processing, social interactions analysis, and more.

First, you will learn some easy-to-follow practical techniques on how to read, write, clean, reformat, explore, and understand your data—arguably the most time-consuming (and the most important) tasks for any data scientist.

In the second section, different independent recipes delve into intermediate topics such as classification, clustering, predicting, and more. With the help of these easy-to-follow recipes, you will also learn techniques that can easily be expanded to solve other real-life problems such as building recommendation engines or predictive models.

In the third section, you will explore more advanced topics: from the field of graph theory through natural language processing, discrete choice modeling to simulations. You will also get to expand your knowledge on identifying fraud origin with the help of a graph, scrape Internet websites, and classify movies based on their reviews.

By the end of this book, you will be able to efficiently use the vast array of tools that the Python environment has to offer.

Style and approach

This hands-on recipe guide is divided into three sections that tackle and overcome real-world data modeling problems faced by data analysts/scientist in their everyday work. Each independent recipe is written in an easy-to-follow and step-by-step fashion.

Tools to learn more effectively

Saving Books

Saving Books

Keyword Search

Keyword Search

Annotating Text

Annotating Text

Listen to it instead

Listen to it instead

Practical Data Analysis Cookbook


Table of Contents

Practical Data Analysis Cookbook
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why Subscribe?
Free Access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Sections
Getting ready
How to do it…
How it works…
There's more…
See also
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Preparing the Data
Introduction
Reading and writing CSV/TSV files with Python
Getting ready
How to do it…
How it works…
There's more…
See also
Reading and writing JSON files with Python
Getting ready
How to do it…
How it works…
There's more…
See also
Reading and writing Excel files with Python
Getting ready
How to do it…
How it works…
There's more…
See also
Reading and writing XML files with Python
Getting ready
How to do it…
How it works…
Retrieving HTML pages with pandas
Getting ready
How to do it…
How it works…
Storing and retrieving from a relational database
Getting ready
How to do it…
How it works…
There's more…
See also
Storing and retrieving from MongoDB
Getting ready
How to do it…
How it works…
See also
Opening and transforming data with OpenRefine
Getting ready
How to do it…
See also
Exploring the data with Open Refine
Getting ready
How to do it…
Removing duplicates
Getting ready
How to do it…
Using regular expressions and GREL to clean up data
Getting ready
How to do it…
See also
Imputing missing observations
Getting ready
How to do it…
How it works…
There's more…
Normalizing and standardizing the features
Getting ready
How to do it…
How it works…
Binning the observations
Getting ready
How to do it…
How it works…
There's more…
Encoding categorical variables
Getting ready
How to do it…
How it works…
2. Exploring the Data
Introduction
Producing descriptive statistics
Getting ready
How to do it…
How it works…
There's more…
See also…
Exploring correlations between features
Getting ready
How to do it…
How it works…
See also…
Visualizing the interactions between features
Getting ready
How to do it…
How it works…
See also…
Producing histograms
Getting ready
How to do it…
How it works…
There's more…
See also…
Creating multivariate charts
Getting ready
How to do it…
How it works…
See also…
Sampling the data
Getting ready
How to do it…
How it works…
There's more…
Splitting the dataset into training, cross-validation, and testing
Getting ready
How to do it…
How it works…
There's more…
3. Classification Techniques
Introduction
Testing and comparing the models
Getting ready
How to do it…
How it works…
There's more…
See also
Classifying with NaĆÆve Bayes
Getting ready
How to do it…
How it works…
See also
Using logistic regression as a universal classifier
Getting ready
How to do it…
How it works…
There's more…
See also
Utilizing Support Vector Machines as a classification engine
Getting ready
How to do it…
How it works…
There's more…
Classifying calls with decision trees
Getting ready
How to do it…
How it works…
There's more…
Predicting subscribers with random tree forests
Getting ready
How to do it…
How it works…
There's more…
Employing neural networks to classify calls
Getting ready
How to do it…
How it works…
There's more…
See also
4. Clustering Techniques
Introduction
Assessing the performance of a clustering method
Getting ready
How to do it…
How it works…
See also…
Clustering data with k-means algorithm
Getting ready
How to do it…
How it works…
There's more…
See also…
Finding an optimal number of clusters for k-means
Getting ready
How to do it…
How it works…
There's more…
Discovering clusters with mean shift clustering model
Getting ready
How to do it…
How it works…
See also…
Building fuzzy clustering model with c-means
Getting ready
How to do it…
How it works…
Using hierarchical model to cluster your data
Getting ready
How to do it…
How it works…
There's more…
See also…
Finding groups of potential subscribers with DBSCAN and BIRCH algorithms
Getting ready
How to do it…
How it works…
See also…
5. Reducing Dimensions
Introduction
Creating three-dimensional scatter plots to present principal components
Getting ready
How to do it…
How it works…
Reducing the dimensions using the kernel version of PCA
Getting ready
How to do it…
How it works…
There's more…
See also
Using Principal Component Analysis to find things that matter
Getting ready
How to do it…
How it works…
There's more…
See also
Finding the principal components in your data using randomized PCA
Getting ready
How to do it…
How it works…
There's more…
Extracting the useful dimensions using Linear Discriminant Analysis
Getting ready
How to do it…
How it works…
Using various dimension reduction techniques to classify calls using the k-Nearest Neighbors classification model
Getting ready
How to do it…
How it works…
6. Regression Methods
Introduction
Identifying and tackling multicollinearity
Getting ready
How to do it…
How it works…
There's more…
Building Linear Regression model
Getting ready
How to do it…
How it works…
There's more…
Using OLS to forecast how much electricity can be produced
Getting ready
How to do it…
How it works…
There's more…
See also
Estimating the output of an electric plant using CART
Getting ready
How to do it…
How it works…
There's more…
See also
Employing the kNN model in a regression problem
Getting ready
How to do it…
How it works…
Applying the Random Forest model to a regression analysis
Getting ready
How to do it…
How it ...

Table of contents

  1. Practical Data Analysis Cookbook

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Practical Data Analysis Cookbook by Tomasz Drabas in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Modelling & Design. We have over one million books available in our catalogue for you to explore.