Practical Data Science with Python
eBook - ePub

Practical Data Science with Python

Learn tools and techniques from hands-on examples to extract insights from data

  1. 620 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Practical Data Science with Python

Learn tools and techniques from hands-on examples to extract insights from data

About this book

Learn to effectively manage data and execute data science projects from start to finish using Python

Key Features

  • Understand and utilize data science tools in Python, such as specialized machine learning algorithms and statistical modeling
  • Build a strong data science foundation with the best data science tools available in Python
  • Add value to yourself, your organization, and society by extracting actionable insights from raw data

Book Description

Practical Data Science with Python teaches you core data science concepts, with real-world and realistic examples, and strengthens your grip on the basic as well as advanced principles of data preparation and storage, statistics, probability theory, machine learning, and Python programming, helping you build a solid foundation to gain proficiency in data science.

The book starts with an overview of basic Python skills and then introduces foundational data science techniques, followed by a thorough explanation of the Python code needed to execute the techniques. You'll understand the code by working through the examples. The code has been broken down into small chunks (a few lines or a function at a time) to enable thorough discussion.

As you progress, you will learn how to perform data analysis while exploring the functionalities of key data science Python packages, including pandas, SciPy, and scikit-learn. Finally, the book covers ethics and privacy concerns in data science and suggests resources for improving data science skills, as well as ways to stay up to date on new data science developments.

By the end of the book, you should be able to comfortably use Python for basic data science projects and should have the skills to execute the data science process on any data source.

What you will learn

  • Use Python data science packages effectively
  • Clean and prepare data for data science work, including feature engineering and feature selection
  • Data modeling, including classic statistical models (such as t-tests), and essential machine learning algorithms, such as random forests and boosted models
  • Evaluate model performance
  • Compare and understand different machine learning methods
  • Interact with Excel spreadsheets through Python
  • Create automated data science reports through Python
  • Get to grips with text analytics techniques

Who this book is for

The book is intended for beginners, including students starting or about to start a data science, analytics, or related program (e.g. Bachelor's, Master's, bootcamp, online courses), recent college graduates who want to learn new skills to set them apart in the job market, professionals who want to learn hands-on data science techniques in Python, and those who want to shift their career to data science.

The book requires basic familiarity with Python. A "getting started with Python" section has been included to get complete novices up to speed.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Practical Data Science with Python by Nathan George in PDF and/or ePUB format, as well as other popular books in Computer Science & Artificial Intelligence (AI) & Semantics. We have over one million books available in our catalogue for you to explore.

2

Getting Started with Python

As we already discovered in Chapter 1, Introduction to Data Science, Python is the most commonly used language for data science, and so we will be using it exclusively in this book. In this chapter, we'll go through a crash course in Python. This should get you up to speed with the basics, although to learn Python in more depth, you should seek more resources. For example, Fabrizio Roman's Learning Python from Packt may be a resource you might want to check out in order to learn Python more deeply.
In this chapter, we'll cover the following topics:
  • Installing Python with a Python distribution (Anaconda)
  • Editing Python code with code text editors and Jupyter Notebooks
  • Running code with Jupyter Notebooks, IPython, and the command line
  • Installing Python packages and creating virtual environments
  • The basics of Python programming, including strings, numbers, loops, data structures, functions, and classes
  • Debugging errors and using documentation
  • Software engineering best practices, such as Git for version control
Let's get started with installing Python!

Installing Python with Anaconda and getting started

There are several ways to install Python, but the one we will use here is the Anaconda Python distribution. A distribution is a way of installing Python along with several Python packages/libraries, and possibly some other software. This saves us some time when installing and can give us additional functionalities, such as the ability to easily install complex packages with software dependencies. If you are unable to install Anaconda for whatever reason (for example, system administrative permission restrictions), you can try to instead install Python from other sources such as the official Python website (www.python.org/downloads/) or from the Microsoft store. In that case, you will need to exclusively use the pip package manager, and not conda.

Installing Anaconda

Our reasons for using Anaconda are severalfold. For one, Anaconda is widely used in the Python community, meaning the network effects are strong. This means a large community is available to help us with problems (for example, through Stack Overflow). It also means more people are contributing to the project. Another advantage of Anaconda is that it makes installing Python packages with complex dependencies much easier. For example, neural network packages such as TensorFlow and PyTorch require CUDA and cuDNN software to be installed, and H2O (a machine learning and AI software package) requires Java to be installed properly. Anaconda takes care of these dependencies for us when it installs these packages, saving us huge headaches and time. Anaconda comes with a GUI (Anaconda Navigator) and some other bells and whistles. It also allows us to create virtual environments with different versions of Python, which we will get to soon.
Installing Anaconda should be relatively easy. We simply query an internet search engine for "download Anaconda" and install it with the installer (currently, the download page is located at www.anaconda.com/products/individual). When installing Anaconda on Mac, there shouldn't be any options that change things drastically – going with the defaults should be fine. On Linux, be sure to select yes when asked Do you wish the installer to initialize Anaconda3 by running conda init?. The recommended settings from Anaconda's documentation should work well for installation (docs.anaconda.com/anaconda/install/). For Windows, I usually check the box for Add Anaconda3 to my PATH environment variable, even though this is not recommended. This will allow us to run Python and conda from any terminal or shell on our system.
You could also manually add conda and Anaconda Python to your PATH environment variable, but checking the box upon installation is easier (even though Anaconda doesn't recommend doing it). In my experience, I haven't had problems when checking the Add to PATH box on Windows Anaconda installations.
Once Anaconda is installed, you should be able to open a terminal or Command Prompt and run the command python to get to a basic Python shell, which we will cover in the next section. Now on to the next step – actually running Python code!

Running Python code

We will cover several options for running code here: the base Python shell, IPython, and Jupyter Notebooks. Some text editors and IDEs also allow us to run Python code from within the editor or IDE, although we will not cover that here.

The Python shell

There are several ways to run Python code, but let's start with the simplest – running code through a simple Python shell. Python is what's called an "interpreted" language, meaning code can be run on-the-fly (it's not converted into machine code). Compiling code means translating the human-readable code to machine code, which is a string of 1s and 0s that are given as instructions to a CPU. Interpreting code means running it by translating Python code on-the-fly to instructions the computer can run more directly. Compiled code usually runs faster than interpreted code, but we have the extra steps of compiling the program and then running it. This means we cannot run code interactively one bit at a time. So, interpreted code has the advantage of being able to run code interactively and one line at a time, while compiled code typically runs faster.
To try out Python's interpreted code execution, we should first open a terminal on Mac or Linux, or an Anaconda PowerShell Prompt from the Start menu on Windows (PowerShell has more commands available than a plain Command Prompt on Windows). With our command line ready, we then simply type python, et voilà! We have access to the Python shell. You can try some basic commands, such as 2 + 2 and print('hello').
This...

Table of contents

  1. Preface
  2. An Introduction and the Basics
  3. Introduction to Data Science
  4. Getting Started with Python
  5. Dealing with Data
  6. SQL and Built-in File Handling Modules in Python
  7. Loading and Wrangling Data with Pandas and NumPy
  8. Exploratory Data Analysis and Visualization
  9. Data Wrangling Documents and Spreadsheets
  10. Web Scraping
  11. Statistics for Data Science
  12. Probability, Distributions, and Sampling
  13. Statistical Testing for Data Science
  14. Machine Learning
  15. Preparing Data for Machine Learning: Feature Selection, Feature Engineering, and Dimensionality Reduction
  16. Machine Learning for Classification
  17. Evaluating Machine Learning Classification Models and Sampling for Classification
  18. Machine Learning with Regression
  19. Optimizing Models and Using AutoML
  20. Tree-Based Machine Learning Models
  21. Support Vector Machine (SVM) Machine Learning Models
  22. Text Analysis and Reporting
  23. Clustering with Machine Learning
  24. Working with Text
  25. Wrapping Up
  26. Data Storytelling and Automated Reporting/Dashboarding
  27. Ethics and Privacy
  28. Staying Up to Date and the Future of Data Science
  29. Other Books You May Enjoy
  30. Index