Practical Data Analysis Using Jupyter Notebook
eBook - ePub

Practical Data Analysis Using Jupyter Notebook

Learn how to speak the language of data by extracting useful and actionable insights using Python

  1. 322 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Practical Data Analysis Using Jupyter Notebook

Learn how to speak the language of data by extracting useful and actionable insights using Python

About this book

Understand data analysis concepts to make accurate decisions based on data using Python programming and Jupyter Notebook

Key Features

  • Find out how to use Python code to extract insights from data using real-world examples
  • Work with structured data and free text sources to answer questions and add value using data
  • Perform data analysis from scratch with the help of clear explanations for cleaning, transforming, and visualizing data

Book Description

Data literacy is the ability to read, analyze, work with, and argue using data. Data analysis is the process of cleaning and modeling your data to discover useful information. This book combines these two concepts by sharing proven techniques and hands-on examples so that you can learn how to communicate effectively using data.

After introducing you to the basics of data analysis using Jupyter Notebook and Python, the book will take you through the fundamentals of data. Packed with practical examples, this guide will teach you how to clean, wrangle, analyze, and visualize data to gain useful insights, and you'll discover how to answer questions using data with easy-to-follow steps.

Later chapters teach you about storytelling with data using charts, such as histograms and scatter plots. As you advance, you'll understand how to work with unstructured data using natural language processing (NLP) techniques to perform sentiment analysis. All the knowledge you gain will help you discover key patterns and trends in data using real-world examples. In addition to this, you will learn how to handle data of varying complexity to perform efficient data analysis using modern Python libraries.

By the end of this book, you'll have gained the practical skills you need to analyze data with confidence.

What you will learn

  • Understand the importance of data literacy and how to communicate effectively using data
  • Find out how to use Python packages such as NumPy, pandas, Matplotlib, and the Natural Language Toolkit (NLTK) for data analysis
  • Wrangle data and create DataFrames using pandas
  • Produce charts and data visualizations using time-series datasets
  • Discover relationships and how to join data together using SQL
  • Use NLP techniques to work with unstructured data to create sentiment analysis models
  • Discover patterns in real-world datasets that provide accurate insights

Who this book is for

This book is for aspiring data analysts and data scientists looking for hands-on tutorials and real-world examples to understand data analysis concepts using SQL, Python, and Jupyter Notebook. Anyone looking to evolve their skills to become data-driven personally and professionally will also find this book useful. No prior knowledge of data analysis or programming is required to get started with this book.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Practical Data Analysis Using Jupyter Notebook by Marc Wintjen in PDF and/or ePUB format, as well as other popular books in Computer Science & Artificial Intelligence (AI) & Semantics. We have over one million books available in our catalogue for you to explore.
Section 1: Data Analysis Essentials
In this section, we will learn how to speak the language of data by extracting useful and actionable insights from data using Python and Jupyter Notebook. We'll begin with the fundamentals of data analysis and work with the right tools to help you analyze data effectively. After your workspace has been set up, we'll learn how to work with data using two popular open source libraries available in Python: NumPy and pandas. This will lay the foundation for you to understand data so that you can prepare for Section 2: Solutions for Data Discovery.
This section includes the following chapters:
  • Chapter 1, Fundamentals of Data Analysis
  • Chapter 2, Overview of Python and Installing Jupyter Notebook
  • Chapter 3, Getting Started with NumPy
  • Chapter 4, Creating Your First pandas DataFrame
  • Chapter 5, Gathering and Loading Data in Python
Fundamentals of Data Analysis
Welcome and thank you for reading my book. I'm excited to share my passion for data and I hope to provide the resources and insights to fast-track your journey into data analysis. My goal is to educate, mentor, and coach you throughout this book on the techniques used to become a top-notch data analyst. During this process, you will get hands-on experience using the latest open source technologies available such as Jupyter Notebook and Python. We will stay within that technology ecosystem throughout this book to avoid confusion. However, you can be confident the concepts and skills learned are transferable across open source and vendor solutions with a focus on all things data.
In this chapter, we will cover the following:
  • The evolution of data analysis and why it is important
  • What makes a good data analyst?
  • Understanding data types and why they are important
  • Data classifications and data attributes explained
  • Understanding data literacy

The evolution of data analysis and why it is important

To begin, we should define what data is. You will find varying definitions but I would define data as the digital persistence of facts, knowledge, and information consolidated for reference or analysis. The focus of my definition should be the word persistence because digital facts remain even after the computers used to create them are powered down and they are retrievable for future use. Rather than focus on the formal definition, let's discuss the world of data and how it impacts our daily lives. Whether you are reading a review to decide which product to buy or viewing the price of a stock, consuming information has become significantly easier to allow you to make informed data-driven decisions.
Data has been entangled into products and services across every industry from farming to smartphones. For example, America's Grow-a-Row, a New Jersey farm to food bank charity, donated over 1.5 million pounds of fresh produce to feed people in need throughout the region each year, according to their annual report. America's Grow-a-Row has thousands of volunteers and uses data to maximize production yields during the harvest season.
As the demand for being a consumer of data has increased, so has the supply side, which is characterized as the producer of data. Producing data has increased in scale as the technology innovations have evolved. I'll discuss this in more detail shortly, but this large scale consumption and production can be summarized as big data. A National Institute of Standards and Technology report defined big data as consisting of extensive datasets—primarily in the characteristics of volume, velocity, and/or variability—that require a scalable architecture for efficient storage, manipulation, and analysis.
This explosion of big data is characterized by the 3Vs, which are Volume, Velocity, and Variety,and has become a widely accepted concept among data professionals:
  • Volume is based on the quantity of data that is stored in any format such as image files, movies, and database transactions, which are measured in gigabytes, terabytes, or even zettabytes. To give context, you can store hundreds of thousands of songs or pictures on one terabyte of storage space. Even more amazing than the figures is how much it costs you. Google Drive, for example, offers up to 5 TB (terabytes) of storage for free according to their support site.
  • Velocity is the speed at which data is generated. This process covers how data is both produced and consumed. For example, batch processing is how data feeds are sent between systems where blocks of records or bundles of files are sent and received. Modern velocity approaches are real time, streams of data where the data flow is in a constant state of movement.
  • Variety is all of the different formats that data can be stored in, including text, image, database tables, and files. This variety has created both challenges and opportunities for analysis because of the different technologies and techniques required to work with the data.
Understanding the 3Vs is important for data analysis because you must become good at being both a consumer and producer of data. The simple questions of how your data is stored, when this file was produced, where the database table is located, and in what format I shouldstore the output of my analysis of the data can all be addressed by understanding the 3Vs.
There is some debate—for which I disagree—that the 3Vs should increase to include Value, Visualization, and Veracity. No worries, we will cover these concepts throughout this book.
This leads us to a formal definition of data analysis which is defined as a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusion, and supporting decision-making, as stated in Review of business intelligence through data analysis.
Xia, B. S., & Gong, P. (2015). Review of business intelligence through data analysis. Benchmarking, 21(2), 300-311. doi:10.1108/BIJ-08-2012-0050
What I like about this definition is the focus on solving problems using data without the focus on which technologies are used. To make this possible there have been some significant technological milestones, the introduction of new concepts, and people who have broken down the barriers.
To showcase the evolution of data analysis, I compiled a few tables of key events from the...

Table of contents

  1. Title Page
  2. Copyright and Credits
  3. About Packt
  4. Foreword
  5. Contributors
  6. Preface
  7. Section 1: Data Analysis Essentials
  8. Fundamentals of Data Analysis
  9. Overview of Python and Installing Jupyter Notebook
  10. Getting Started with NumPy
  11. Creating Your First pandas DataFrame
  12. Gathering and Loading Data in Python
  13. Section 2: Solutions for Data Discovery
  14. Visualizing and Working with Time Series Data
  15. Exploring, Cleaning, Refining, and Blending Datasets
  16. Understanding Joins, Relationships, and Aggregates
  17. Plotting, Visualization, and Storytelling
  18. Section 3: Working with Unstructured Big Data
  19. Exploring Text Data and Unstructured Data
  20. Practical Sentiment Analysis
  21. Bringing It All Together
  22. Works Cited
  23. Other Books You May Enjoy