eBook - ePub

Practical Data Analysis Using Jupyter Notebook

Name: Practical Data Analysis Using Jupyter Notebook
ISBN: 9781838825096

Learn how to speak the language of data by extracting useful and actionable insights using Python

Marc Wintjen,

322 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Practical Data Analysis Using Jupyter Notebook

Learn how to speak the language of data by extracting useful and actionable insights using Python

Marc Wintjen,

About this book

Understand data analysis concepts to make accurate decisions based on data using Python programming and Jupyter Notebook

Key Features

Find out how to use Python code to extract insights from data using real-world examples
Work with structured data and free text sources to answer questions and add value using data
Perform data analysis from scratch with the help of clear explanations for cleaning, transforming, and visualizing data

Book Description

Data literacy is the ability to read, analyze, work with, and argue using data. Data analysis is the process of cleaning and modeling your data to discover useful information. This book combines these two concepts by sharing proven techniques and hands-on examples so that you can learn how to communicate effectively using data.

After introducing you to the basics of data analysis using Jupyter Notebook and Python, the book will take you through the fundamentals of data. Packed with practical examples, this guide will teach you how to clean, wrangle, analyze, and visualize data to gain useful insights, and you'll discover how to answer questions using data with easy-to-follow steps.

Later chapters teach you about storytelling with data using charts, such as histograms and scatter plots. As you advance, you'll understand how to work with unstructured data using natural language processing (NLP) techniques to perform sentiment analysis. All the knowledge you gain will help you discover key patterns and trends in data using real-world examples. In addition to this, you will learn how to handle data of varying complexity to perform efficient data analysis using modern Python libraries.

By the end of this book, you'll have gained the practical skills you need to analyze data with confidence.

What you will learn

Understand the importance of data literacy and how to communicate effectively using data
Find out how to use Python packages such as NumPy, pandas, Matplotlib, and the Natural Language Toolkit (NLTK) for data analysis
Wrangle data and create DataFrames using pandas
Produce charts and data visualizations using time-series datasets
Discover relationships and how to join data together using SQL
Use NLP techniques to work with unstructured data to create sentiment analysis models
Discover patterns in real-world datasets that provide accurate insights

Who this book is for

This book is for aspiring data analysts and data scientists looking for hands-on tutorials and real-world examples to understand data analysis concepts using SQL, Python, and Jupyter Notebook. Anyone looking to evolve their skills to become data-driven personally and professionally will also find this book useful. No prior knowledge of data analysis or programming is required to get started with this book.

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Publisher

Packt Publishing

Year

2020

Print ISBN

9781838826031

Edition

eBook ISBN

9781838825096

Topic

Computer Science

Subtopic

Artificial Intelligence (AI) & Semantics

Index

Computer Science

Section 1: Data Analysis Essentials

In this section, we will learn how to speak the language of data by extracting useful and actionable insights from data using Python and Jupyter Notebook. We'll begin with the fundamentals of data analysis and work with the right tools to help you analyze data effectively. After your workspace has been set up, we'll learn how to work with data using two popular open source libraries available in Python: NumPy and pandas. This will lay the foundation for you to understand data so that you can prepare for Section 2: Solutions for Data Discovery.

This section includes the following chapters:

Chapter 1, Fundamentals of Data Analysis
Chapter 2, Overview of Python and Installing Jupyter Notebook
Chapter 3, Getting Started with NumPy
Chapter 4, Creating Your First pandas DataFrame
Chapter 5, Gathering and Loading Data in Python

Fundamentals of Data Analysis

Welcome and thank you for reading my book. I'm excited to share my passion for data and I hope to provide the resources and insights to fast-track your journey into data analysis. My goal is to educate, mentor, and coach you throughout this book on the techniques used to become a top-notch data analyst. During this process, you will get hands-on experience using the latest open source technologies available such as Jupyter Notebook and Python. We will stay within that technology ecosystem throughout this book to avoid confusion. However, you can be confident the concepts and skills learned are transferable across open source and vendor solutions with a focus on all things data.

In this chapter, we will cover the following:

The evolution of data analysis and why it is important
What makes a good data analyst?
Understanding data types and why they are important
Data classifications and data attributes explained
Understanding data literacy

The evolution of data analysis and why it is important

To begin, we should define what data is. You will find varying definitions but I would define data as the digital persistence of facts, knowledge, and information consolidated for reference or analysis. The focus of my definition should be the word persistence because digital facts remain even after the computers used to create them are powered down and they are retrievable for future use. Rather than focus on the formal definition, let's discuss the world of data and how it impacts our daily lives. Whether you are reading a review to decide which product to buy or viewing the price of a stock, consuming information has become significantly easier to allow you to make informed data-driven decisions.

Data has been entangled into products and services across every industry from farming to smartphones. For example, America's Grow-a-Row, a New Jersey farm to food bank charity, donated over 1.5 million pounds of fresh produce to feed people in need throughout the region each year, according to their annual report. America's Grow-a-Row has thousands of volunteers and uses data to maximize production yields during the harvest season.

As the demand for being a consumer of data has increased, so has the supply side, which is characterized as the producer of data. Producing data has increased in scale as the technology innovations have evolved. I'll discuss this in more detail shortly, but this large scale consumption and production can be summarized as big data. A National Institute of Standards and Technology report defined big data as consisting of extensive datasets—primarily in the characteristics of volume, velocity, and/or variability—that require a scalable architecture for efficient storage, manipulation, and analysis.

This explosion of big data is characterized by the 3Vs, which are Volume, Velocity, and Variety,and has become a widely accepted concept among data professionals:

Volume is based on the quantity of data that is stored in any format such as image files, movies, and database transactions, which are measured in gigabytes, terabytes, or even zettabytes. To give context, you can store hundreds of thousands of songs or pictures on one terabyte of storage space. Even more amazing than the figures is how much it costs you. Google Drive, for example, offers up to 5 TB (terabytes) of storage for free according to their support site.
Velocity is the speed at which data is generated. This process covers how data is both produced and consumed. For example, batch processing is how data feeds are sent between systems where blocks of records or bundles of files are sent and received. Modern velocity approaches are real time, streams of data where the data flow is in a constant state of movement.
Variety is all of the different formats that data can be stored in, including text, image, database tables, and files. This variety has created both challenges and opportunities for analysis because of the different technologies and techniques required to work with the data.

Understanding the 3Vs is important for data analysis because you must become good at being both a consumer and producer of data. The simple questions of how your data is stored, when this file was produced, where the database table is located, and in what format I shouldstore the output of my analysis of the data can all be addressed by understanding the 3Vs.

There is some debate—for which I disagree—that the 3Vs should increase to include Value, Visualization, and Veracity. No worries, we will cover these concepts throughout this book.

This leads us to a formal definition of data analysis which is defined as a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusion, and supporting decision-making, as stated in Review of business intelligence through data analysis.

Xia, B. S., & Gong, P. (2015). Review of business intelligence through data analysis. Benchmarking, 21(2), 300-311. doi:10.1108/BIJ-08-2012-0050

What I like about this definition is the focus on solving problems using data without the focus on which technologies are used. To make this possible there have been some significant technological milestones, the introduction of new concepts, and people who have broken down the barriers.

To showcase the evolution of data analysis, I compiled a few tables of key events from the...

Title Page
Copyright and Credits
About Packt
Foreword
Contributors
Preface
Section 1: Data Analysis Essentials
Fundamentals of Data Analysis
Overview of Python and Installing Jupyter Notebook
Getting Started with NumPy
Creating Your First pandas DataFrame
Gathering and Loading Data in Python
Section 2: Solutions for Data Discovery
Visualizing and Working with Time Series Data
Exploring, Cleaning, Refining, and Blending Datasets
Understanding Joins, Relationships, and Aggregates
Plotting, Visualization, and Storytelling
Section 3: Working with Unstructured Big Data
Exploring Text Data and Unstructured Data
Practical Sentiment Analysis
Bringing It All Together
Works Cited
Other Books You May Enjoy

Frequently asked questions

Can I cancel at any time?

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

Can I download books?

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

What is the difference between the pricing plans?

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

How does Perlego work?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Can I read on my tablet or smartphone?

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Is Practical Data Analysis Using Jupyter Notebook an online PDF/ePUB?

Yes, you can access Practical Data Analysis Using Jupyter Notebook by Marc Wintjen in PDF and/or ePUB format, as well as other popular books in Computer Science & Artificial Intelligence (AI) & Semantics. We have over 1.5 million books available in our catalogue for you to explore.

Practical Data Analysis Using Jupyter Notebook

Learn how to speak the language of data by extracting useful and actionable insights using Python

Practical Data Analysis Using Jupyter Notebook

Learn how to speak the language of data by extracting useful and actionable insights using Python

About this book

Trusted by 375,005 students

Information

The evolution of data analysis and why it is important

Table of contents

Frequently asked questions