Hands-On Data Analysis with Pandas
eBook - ePub

Hands-On Data Analysis with Pandas

A Python data science handbook for data collection, wrangling, analysis, and visualization, 2nd Edition

Stefanie Molin

Partager le livre
  1. 788 pages
  2. English
  3. ePUB (adapté aux mobiles)
  4. Disponible sur iOS et Android
eBook - ePub

Hands-On Data Analysis with Pandas

A Python data science handbook for data collection, wrangling, analysis, and visualization, 2nd Edition

Stefanie Molin

DĂ©tails du livre
Aperçu du livre
Table des matiĂšres
Citations

À propos de ce livre

Get to grips with pandas by working with real datasets and master data discovery, data manipulation, data preparation, and handling data for analytical tasks

Key Features

  • Perform efficient data analysis and manipulation tasks using pandas 1.x
  • Apply pandas to different real-world domains with the help of step-by-step examples
  • Make the most of pandas as an effective data exploration tool

Book Description

Extracting valuable business insights is no longer a 'nice-to-have', but an essential skill for anyone who handles data in their enterprise. Hands-On Data Analysis with Pandas is here to help beginners and those who are migrating their skills into data science get up to speed in no time.This book will show you how to analyze your data, get started with machine learning, and work effectively with the Python libraries often used for data science, such as pandas, NumPy, matplotlib, seaborn, and scikit-learn.Using real-world datasets, you will learn how to use the pandas library to perform data wrangling to reshape, clean, and aggregate your data. Then, you will learn how to conduct exploratory data analysis by calculating summary statistics and visualizing the data to find patterns. In the concluding chapters, you will explore some applications of anomaly detection, regression, clustering, and classification using scikit-learn to make predictions based on past data.This updated edition will equip you with the skills you need to use pandas 1.x to efficiently perform various data manipulation tasks, reliably reproduce analyses, and visualize your data for effective decision making – valuable knowledge that can be applied across multiple domains.

What you will learn

  • Understand how data analysts and scientists gather and analyze data
  • Perform data analysis and data wrangling using Python
  • Combine, group, and aggregate data from multiple sources
  • Create data visualizations with pandas, matplotlib, and seaborn
  • Apply machine learning algorithms to identify patterns and make predictions
  • Use Python data science libraries to analyze real-world datasets
  • Solve common data representation and analysis problems using pandas
  • Build Python scripts, modules, and packages for reusable analysis code

Who this book is for

This book is for data science beginners, data analysts, and Python developers who want to explore each stage of data analysis and scientific computing using a wide range of datasets. Data scientists looking to implement pandas in their machine learning workflow will also find plenty of valuable know-how as they progress.You'll find it easier to follow along with this book if you have a working knowledge of the Python programming language, but a Python crash-course tutorial is provided in the code bundle for anyone who needs a refresher.

]]>

Foire aux questions

Comment puis-je résilier mon abonnement ?
Il vous suffit de vous rendre dans la section compte dans paramĂštres et de cliquer sur « RĂ©silier l’abonnement ». C’est aussi simple que cela ! Une fois que vous aurez rĂ©siliĂ© votre abonnement, il restera actif pour le reste de la pĂ©riode pour laquelle vous avez payĂ©. DĂ©couvrez-en plus ici.
Puis-je / comment puis-je télécharger des livres ?
Pour le moment, tous nos livres en format ePub adaptĂ©s aux mobiles peuvent ĂȘtre tĂ©lĂ©chargĂ©s via l’application. La plupart de nos PDF sont Ă©galement disponibles en tĂ©lĂ©chargement et les autres seront tĂ©lĂ©chargeables trĂšs prochainement. DĂ©couvrez-en plus ici.
Quelle est la différence entre les formules tarifaires ?
Les deux abonnements vous donnent un accĂšs complet Ă  la bibliothĂšque et Ă  toutes les fonctionnalitĂ©s de Perlego. Les seules diffĂ©rences sont les tarifs ainsi que la pĂ©riode d’abonnement : avec l’abonnement annuel, vous Ă©conomiserez environ 30 % par rapport Ă  12 mois d’abonnement mensuel.
Qu’est-ce que Perlego ?
Nous sommes un service d’abonnement Ă  des ouvrages universitaires en ligne, oĂč vous pouvez accĂ©der Ă  toute une bibliothĂšque pour un prix infĂ©rieur Ă  celui d’un seul livre par mois. Avec plus d’un million de livres sur plus de 1 000 sujets, nous avons ce qu’il vous faut ! DĂ©couvrez-en plus ici.
Prenez-vous en charge la synthÚse vocale ?
Recherchez le symbole Écouter sur votre prochain livre pour voir si vous pouvez l’écouter. L’outil Écouter lit le texte Ă  haute voix pour vous, en surlignant le passage qui est en cours de lecture. Vous pouvez le mettre sur pause, l’accĂ©lĂ©rer ou le ralentir. DĂ©couvrez-en plus ici.
Est-ce que Hands-On Data Analysis with Pandas est un PDF/ePUB en ligne ?
Oui, vous pouvez accĂ©der Ă  Hands-On Data Analysis with Pandas par Stefanie Molin en format PDF et/ou ePUB ainsi qu’à d’autres livres populaires dans Informatica et Elaborazione di dati. Nous disposons de plus d’un million d’ouvrages Ă  dĂ©couvrir dans notre catalogue.

Informations

Année
2021
ISBN
9781800565913

Section 1: Getting Started with Pandas

Our journey begins with an introduction to data analysis and statistics, which will lay a strong foundation for the concepts we will cover throughout the book. Then, we will set up our Python data science environment, which contains everything we will need to work through the examples, and get started with learning the basics of pandas.
This section comprises the following chapters:
  • Chapter 1, Introduction to Data Analysis
  • Chapter 2, Working with Pandas DataFrames

Chapter 1: Introduction to Data Analysis

Before we can begin our hands-on introduction to data analysis with pandas, we need to learn about the fundamentals of data analysis. Those who have ever looked at the documentation for a software library know how overwhelming it can be if you have no clue what you are looking for. Therefore, it is essential that we master not only the coding aspect but also the thought process and workflow required to analyze data, which will prove the most useful in augmenting our skill set in the future.
Much like the scientific method, data science has some common workflows that we can follow when we want to conduct an analysis and present the results. The backbone of this process is statistics, which gives us ways to describe our data, make predictions, and also draw conclusions about it. Since prior knowledge of statistics is not a prerequisite, this chapter will give us exposure to the statistical concepts we will use throughout this book, as well as areas for further exploration.
After covering the fundamentals, we will get our Python environment set up for the remainder of this book. Python is a powerful language, and its uses go way beyond data science: building web applications, software, and web scraping, to name a few. In order to work effectively across projects, we need to learn how to make virtual environments, which will isolate each project's dependencies. Finally, we will learn how to work with Jupyter Notebooks in order to follow along with the text.
The following topics will be covered in this chapter:
  • The fundamentals of data analysis
  • Statistical foundations
  • Setting up a virtual environment

Chapter materials

All the files for this book are on GitHub at https://github.com/stefmolin/Hands-On-Data-Analysis-with-Pandas-2nd-edition. While having a GitHub account isn't necessary to work through this book, it is a good idea to create one, as it will serve as a portfolio for any data/coding projects. In addition, working with Git will provide a version control system and make collaboration easy.
Tip
Check out this article to learn some Git basics: https://www.freecodecamp.org/news/learn-the-basics-of-git-in-under-10-minutes-da548267cc91/.
In order to get a local copy of the files, we have a few options (ordered from least useful to most useful):
  • Download the ZIP file and extract the files locally.
  • Clone the repository without forking it.
  • Fork the repository and then clone it.
This book includes exercises for every chapter; therefore, for those who want to keep a copy of their solutions along with the original content on GitHub, it is highly recommended to fork the repository and clone the forked version. When we fork a repository, GitHub will make a repository under our own profile with the latest version of the original. Then, whenever we make changes to our version, we can push the changes back up. Note that if we simply clone, we don't get this benefit.
The relevant buttons for initiating this process are circled in the following screenshot:
Figure 1.1 – Getting a local copy of the code for following along
Figure 1.1 – Getting a local copy of the code for following along
Important note
The cloning process will copy the files to the current working directory in a folder called Hands-On-Data-Analysis-with-Pandas-2nd-edition. To make a folder to put this repository in, we can use mkdir my_folder && cd my_folder. This will create a new folder (directory) called my_folder and then change the current directory to that folder, after which we can clone the repository. We can chain these two commands (and any number of commands) together by adding && in between them. This can be thought of as and then (provided the first command succeeds).
This repository has folders for each chapter. This chapter's materials can be found at https://github.com/stefmolin/Hands-On-Data-Analysis-with-Pandas-2nd-edition/tree/master/ch_01. While the bulk of this chapter doesn't involve any coding, feel free to follow along in the introduction_to_data_analysis.ipynb notebook on the GitHub website until we set up our environment toward the end of the chapter. After we do so, we will use the check_your_environment.ipynb notebook to get familiar with Jupyter Notebooks and to run some checks to make sure that everything is set up properly for the rest of this book.
Since the code that's used to generate the content in these notebooks is not the main focus of this chapter, the majority of it has been separated into the visual_aids package, which is used to create visuals for explaining concepts throughout the book, and the check_environment.py file. If you choose to inspect these files, don't be overwhelmed; everything that's relevant to data science will be covered in this bo...

Table des matiĂšres