Become a Python Data Analyst
eBook - ePub

Become a Python Data Analyst

Perform exploratory data analysis and gain insight into scientific computing using Python

  1. 178 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Become a Python Data Analyst

Perform exploratory data analysis and gain insight into scientific computing using Python

About this book

Enhance your data analysis and predictive modeling skills using popular Python tools

Key Features

  • Cover all fundamental libraries for operation and manipulation of Python for data analysis
  • Implement real-world datasets to perform predictive analytics with Python
  • Access modern data analysis techniques and detailed code with scikit-learn and SciPy

Book Description

Python is one of the most common and popular languages preferred by leading data analysts and statisticians for working with massive datasets and complex data visualizations.

Become a Python Data Analyst introduces Python's most essential tools and libraries necessary to work with the data analysis process, right from preparing data to performing simple statistical analyses and creating meaningful data visualizations.

In this book, we will cover Python libraries such as NumPy, pandas, matplotlib, seaborn, SciPy, and scikit-learn, and apply them in practical data analysis and statistics examples. As you make your way through the chapters, you will learn to efficiently use the Jupyter Notebook to operate and manipulate data using NumPy and the pandas library. In the concluding chapters, you will gain experience in building simple predictive models and carrying out statistical computation and analysis using rich Python tools and proven data analysis techniques.

By the end of this book, you will have hands-on experience performing data analysis with Python.

What you will learn

  • Explore important Python libraries and learn to install Anaconda distribution
  • Understand the basics of NumPy
  • Produce informative and useful visualizations for analyzing data
  • Perform common statistical calculations
  • Build predictive models and understand the principles of predictive analytics

Who this book is for

Become a Python Data Analyst is for entry-level data analysts, data engineers, and BI professionals who want to make complete use of Python tools for performing efficient data analysis. Prior knowledge of Python programming is necessary to understand the concepts covered in this book

Trusted by 375,005 students

Access to over 1 million titles for a fair monthly price.

Study more efficiently using our study tools.

Information

Visualization and Exploratory Data Analysis

Visualization is a key topic for data science and data analysis, and Python provides a lot of options in terms of executing visualizations for different purposes. In this chapter, we will talk about the two most popular libraries for doing visualization in Python, namely, matplotlib and seaborn. We will also talk about the pandas capabilities for doing visualizations.
Let's look into the following various topics that we will discuss in this chapter:
  • Introducing matplotlib
  • Introducing pyplot
  • Object-oriented interfaces
  • Common customizations
  • Exploratory data analysis with seaborn and pandas
  • Analyzing the variables individually
  • The relationship between variables

Introducing Matplotlib

Matplotlib tries to make easy things easier, and hard things possible. Basically, matplotlib is a plotting library that produces publication quality figures in a variety of formats and interactive environments. Let's now discuss what matplotlib is, its capabilities, and also its basic concepts, figures, subplots (axes), and axes. It can be used everywhere and for a variety of purposes. It can also be used in Python scripts, Python interpreter, the Python shell, the Jupyter Notebook, web application servers, and every graphical user interface that we can produce with Python.
Now, let's take a look at our Jupyter Notebook, wherein lies more information about matplotlib. But before doing that, let's first visit the website matplotlib.org. This is the project website and the primary online resource for this library's documentation. We can find examples, frequently asked questions, and the gallery, which is something we need to look at.
What people usually do when they work with matplotlib is they go to the gallery. We can see a representation of it a visualization that approximates to what they are trying to do in the following screenshot:
Let's say that we want to do a box plot. Let's see an example in the following screenshot. We see a visualization that compares a violin plot versus a box plot and when we have something similar to this in mind, we can look for the code, tweak it, and start using it for our own visualization:
In the preceding screenshot, we can see a part of the code but, if we wish to go through the entire code, we can refer this code to the official matplotlib site mentioned earlier.

Terminologies in Matplotlib

Before talking about the main concepts of this library, we will discuss some basic terminologies that we have in matplotlib, such as figures, subplots/axes, and axis. The anatomy of a matplotlib plot starts with the figure that we can see in the following screenshot:
Let's explore the terms mentioned in the preceding screenshot:
  • Figure: The figure is the first top-level container in this hierarchy. It is the overall window that contains everything that is drawn. We can have multiple independent figures and multiple axes in the figure.
  • Axes/Subplot: Now, most of the plotting is done with respect to one axis or subplot. This plot has a lot of components to it, such as the x axis and the y axis; we have a plotting area, we have tick marks, and so on. As part of the subplot, we have other objects such as the x axis and, within the x axis, we have things such as the x label, the x tick marks, and the labels for the tick marks. This is basically the hierarchy that we have in matplotlib.
  • Axis: We can see that the top of the hierarchy has the figure and, inside the figure, we have subplots. But the preceding image has only one subplot, but otherwise, we can have many subplots inside a figure. Every subplot has other elements; most commonly, we will have an x axis, a y axis, and many other elements.

Introduction to pyplot

Now, we will start using matplotlib with the pyplot interface. The topic that we will cover is the pyplot interface and some examples
In our Jupyter Notebook, the first thing that we notice is that we have a command that includes matplotlib inline with the % sign as shown in the following code block:
%matplotlib inline
This is basically the way we tell the Jupyter Notebook that we want to see the plots in the notebook. When we don't use this command and execute this line, we see that the plot will appear in a different window.
Pyplot is basically a collection of command style functions that make matplotlib work similar to MATLAB. The idea is that we have a collection of functions and each function makes some changes to a figure, and this figure is considered to be the current figure. So every function does something to a figure; for instance, we can create a figure, we can create a plotting area in a figure, we can plot a line in a subplot of the figure, and we can change the labels. We have to keep in mind which is the current figure when we use the pyplot interface. Let's take a look at some examples:
  1. The first convention to import matplotlib into the current session is matplotlib.pyplot as plt:
import matplotlib as plt
  1. Our first command includes the plot function from the plt module and pyplot module, and we will also pass a list of numbers. So, when we execute the line, we will see that the command creates a figure. In the following diagram, we have a figure even though we cannot see it, inside the figure we have a subplot, and inside this subplot we have a line plot that is just a graphical representation of the numbers we have in this list:
In the preceding diagram, we can see what this function did, and this figure is considered to be the current figure. Every other function that we use or apply, for instance, if we call this function plt.ylabel, would place a label in the y axis and will be known as ylabel. The label, in this case, is some numbers. Let's run it again and view the following diagram, wherein we can see how this function is applied or has made the label appear in the y axis, as shown in the following diagram:
  1. The most commonly-used function in the pyplot interface is the plot function, which can take many arguments. For instance, if we pass two lists of ...

Table of contents

  1. Title Page
  2. Copyright and Credits
  3. Packt Upsell
  4. Contributor
  5. Preface
  6. The Anaconda Distribution and Jupyter Notebook
  7. Vectorizing Operations with NumPy
  8. Pandas - Everyone's Favorite Data Analysis Library
  9. Visualization and Exploratory Data Analysis
  10. Statistical Computing with Python
  11. Introduction to Predictive Analytics Models
  12. Other Books You May Enjoy

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Become a Python Data Analyst by Alvaro Fuentes in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Mining. We have over one million books available in our catalogue for you to explore.