eBook - ePub

Practical Data Science with Python

Name: Practical Data Science with Python
Author: Nathan George

Learn tools and techniques from hands-on examples to extract insights from data

Nathan George

Share book

620 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Practical Data Science with Python

Learn tools and techniques from hands-on examples to extract insights from data

Nathan George

Book details

Book preview

Table of contents

Citations

About This Book

Learn to effectively manage data and execute data science projects from start to finish using Python

Key Features

Understand and utilize data science tools in Python, such as specialized machine learning algorithms and statistical modeling
Build a strong data science foundation with the best data science tools available in Python
Add value to yourself, your organization, and society by extracting actionable insights from raw data

Book Description

Practical Data Science with Python teaches you core data science concepts, with real-world and realistic examples, and strengthens your grip on the basic as well as advanced principles of data preparation and storage, statistics, probability theory, machine learning, and Python programming, helping you build a solid foundation to gain proficiency in data science.

The book starts with an overview of basic Python skills and then introduces foundational data science techniques, followed by a thorough explanation of the Python code needed to execute the techniques. You'll understand the code by working through the examples. The code has been broken down into small chunks (a few lines or a function at a time) to enable thorough discussion.

As you progress, you will learn how to perform data analysis while exploring the functionalities of key data science Python packages, including pandas, SciPy, and scikit-learn. Finally, the book covers ethics and privacy concerns in data science and suggests resources for improving data science skills, as well as ways to stay up to date on new data science developments.

By the end of the book, you should be able to comfortably use Python for basic data science projects and should have the skills to execute the data science process on any data source.

What you will learn

Use Python data science packages effectively
Clean and prepare data for data science work, including feature engineering and feature selection
Data modeling, including classic statistical models (such as t-tests), and essential machine learning algorithms, such as random forests and boosted models
Evaluate model performance
Compare and understand different machine learning methods
Interact with Excel spreadsheets through Python
Create automated data science reports through Python
Get to grips with text analytics techniques

Who this book is for

The book is intended for beginners, including students starting or about to start a data science, analytics, or related program (e.g. Bachelor's, Master's, bootcamp, online courses), recent college graduates who want to learn new skills to set them apart in the job market, professionals who want to learn hands-on data science techniques in Python, and those who want to shift their career to data science.

The book requires basic familiarity with Python. A "getting started with Python" section has been included to get complete novices up to speed.

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is Practical Data Science with Python an online PDF/ePUB?

Yes, you can access Practical Data Science with Python by Nathan George in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Modelling & Design. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Packt Publishing

Year

2021

ISBN

9781801076654

Edition

Topic

Computer Science

Subtopic

Data Modelling & Design

Index

Computer Science

2 Getting Started with Python

As we already discovered in Chapter 1, Introduction to Data Science, Python is the most commonly used language for data science, and so we will be using it exclusively in this book. In this chapter, we'll go through a crash course in Python. This should get you up to speed with the basics, although to learn Python in more depth, you should seek more resources. For example, Fabrizio Roman's Learning Python from Packt may be a resource you might want to check out in order to learn Python more deeply.

In this chapter, we'll cover the following topics:

Installing Python with a Python distribution (Anaconda)
Editing Python code with code text editors and Jupyter Notebooks
Running code with Jupyter Notebooks, IPython, and the command line
Installing Python packages and creating virtual environments
The basics of Python programming, including strings, numbers, loops, data structures, functions, and classes
Debugging errors and using documentation
Software engineering best practices, such as Git for version control

Let's get started with installing Python!

Installing Python with Anaconda and getting started

There are several ways to install Python, but the one we will use here is the Anaconda Python distribution. A distribution is a way of installing Python along with several Python packages/libraries, and possibly some other software. This saves us some time when installing and can give us additional functionalities, such as the ability to easily install complex packages with software dependencies. If you are unable to install Anaconda for whatever reason (for example, system administrative permission restrictions), you can try to instead install Python from other sources such as the official Python website (www.python.org/downloads/) or from the Microsoft store. In that case, you will need to exclusively use the pip package manager, and not conda.

Installing Anaconda

Our reasons for using Anaconda are severalfold. For one, Anaconda is widely used in the Python community, meaning the network effects are strong. This means a large community is available to help us with problems (for example, through Stack Overflow). It also means more people are contributing to the project. Another advantage of Anaconda is that it makes installing Python packages with complex dependencies much easier. For example, neural network packages such as TensorFlow and PyTorch require CUDA and cuDNN software to be installed, and H2O (a machine learning and AI software package) requires Java to be installed properly. Anaconda takes care of these dependencies for us when it installs these packages, saving us huge headaches and time. Anaconda comes with a GUI (Anaconda Navigator) and some other bells and whistles. It also allows us to create virtual environments with different versions of Python, which we will get to soon.

Installing Anaconda should be relatively easy. We simply query an internet search engine for "download Anaconda" and install it with the installer (currently, the download page is located at www.anaconda.com/products/individual). When installing Anaconda on Mac, there shouldn't be any options that change things drastically – going with the defaults should be fine. On Linux, be sure to select yes when asked Do you wish the installer to initialize Anaconda3 by running conda init?. The recommended settings from Anaconda's documentation should work well for installation (docs.anaconda.com/anaconda/install/). For Windows, I usually check the box for Add Anaconda3 to my PATH environment variable, even though this is not recommended. This will allow us to run Python and conda from any terminal or shell on our system.

You could also manually add conda and Anaconda Python to your PATH environment variable, but checking the box upon installation is easier (even though Anaconda doesn't recommend doing it). In my experience, I haven't had problems when checking the Add to PATH box on Windows Anaconda installations.

Once Anaconda is installed, you should be able to open a terminal or Command Prompt and run the command python to get to a basic Python shell, which we will cover in the next section. Now on to the next step – actually running Python code!

Running Python code

We will cover several options for running code here: the base Python shell, IPython, and Jupyter Notebooks. Some text editors and IDEs also allow us to run Python code from within the editor or IDE, although we will not cover that here.

The Python shell

There are several ways to run Python code, but let's start with the simplest – running code through a simple Python shell. Python is what's called an "interpreted" language, meaning code can be run on-the-fly (it's not converted into machine code). Compiling code means translating the human-readable code to machine code, which is a string of 1s and 0s that are given as instructions to a CPU. Interpreting code means running it by translating Python code on-the-fly to instructions the computer can run more directly. Compiled code usually runs faster than interpreted code, but we have the extra steps of compiling the program and then running it. This means we cannot run code interactively one bit at a time. So, interpreted code has the advantage of being able to run code interactively and one line at a time, while compiled code typically runs faster.

To try out Python's interpreted code execution, we should first open a terminal on Mac or Linux, or an Anaconda PowerShell Prompt from the Start menu on Windows (PowerShell has more commands available than a plain Command Prompt on Windows). With our command line ready, we then simply type python, et voilà! We have access to the Python shell. You can try some basic commands, such as 2 + 2 and print('hello').

This...