2
Getting Started with Python
As we already discovered in Chapter 1, Introduction to Data Science, Python is the most commonly used language for data science, and so we will be using it exclusively in this book. In this chapter, we'll go through a crash course in Python. This should get you up to speed with the basics, although to learn Python in more depth, you should seek more resources. For example, Fabrizio Roman's Learning Python from Packt may be a resource you might want to check out in order to learn Python more deeply.
In this chapter, we'll cover the following topics:
- Installing Python with a Python distribution (Anaconda)
- Editing Python code with code text editors and Jupyter Notebooks
- Running code with Jupyter Notebooks, IPython, and the command line
- Installing Python packages and creating virtual environments
- The basics of Python programming, including strings, numbers, loops, data structures, functions, and classes
- Debugging errors and using documentation
- Software engineering best practices, such as Git for version control
Let's get started with installing Python!
Installing Python with Anaconda and getting started
There are several ways to install Python, but the one we will use here is the Anaconda Python distribution. A distribution is a way of installing Python along with several Python packages/libraries, and possibly some other software. This saves us some time when installing and can give us additional functionalities, such as the ability to easily install complex packages with software dependencies. If you are unable to install Anaconda for whatever reason (for example, system administrative permission restrictions), you can try to instead install Python from other sources such as the official Python website (www.python.org/downloads/) or from the Microsoft store. In that case, you will need to exclusively use the pip package manager, and not conda.
Installing Anaconda
Our reasons for using Anaconda are severalfold. For one, Anaconda is widely used in the Python community, meaning the network effects are strong. This means a large community is available to help us with problems (for example, through Stack Overflow). It also means more people are contributing to the project. Another advantage of Anaconda is that it makes installing Python packages with complex dependencies much easier. For example, neural network packages such as TensorFlow and PyTorch require CUDA and cuDNN software to be installed, and H2O (a machine learning and AI software package) requires Java to be installed properly. Anaconda takes care of these dependencies for us when it installs these packages, saving us huge headaches and time. Anaconda comes with a GUI (Anaconda Navigator) and some other bells and whistles. It also allows us to create virtual environments with different versions of Python, which we will get to soon.
Installing Anaconda should be relatively easy. We simply query an internet search engine for "download Anaconda" and install it with the installer (currently, the download page is located at www.anaconda.com/products/individual). When installing Anaconda on Mac, there shouldn't be any options that change things drastically – going with the defaults should be fine. On Linux, be sure to select yes when asked Do you wish the installer to initialize Anaconda3 by running conda init?. The recommended settings from Anaconda's documentation should work well for installation (docs.anaconda.com/anaconda/install/). For Windows, I usually check the box for Add Anaconda3 to my PATH environment variable, even though this is not recommended. This will allow us to run Python and conda from any terminal or shell on our system.
You could also manually add conda and Anaconda Python to your PATH environment variable, but checking the box upon installation is easier (even though Anaconda doesn't recommend doing it). In my experience, I haven't had problems when checking the Add to PATH box on Windows Anaconda installations.
Once Anaconda is installed, you should be able to open a terminal or Command Prompt and run the command python to get to a basic Python shell, which we will cover in the next section. Now on to the next step – actually running Python code!
Running Python code
We will cover several options for running code here: the base Python shell, IPython, and Jupyter Notebooks. Some text editors and IDEs also allow us to run Python code from within the editor or IDE, although we will not cover that here.
The Python shell
There are several ways to run Python code, but let's start with the simplest – running code through a simple Python shell. Python is what's called an "interpreted" language, meaning code can be run on-the-fly (it's not converted into machine code). Compiling code means translating the human-readable code to machine code, which is a string of 1s and 0s that are given as instructions to a CPU. Interpreting code means running it by translating Python code on-the-fly to instructions the computer can run more directly. Compiled code usually runs faster than interpreted code, but we have the extra steps of compiling the program and then running it. This means we cannot run code interactively one bit at a time. So, interpreted code has the advantage of being able to run code interactively and one line at a time, while compiled code typically runs faster.
To try out Python's interpreted code execution, we should first open a terminal on Mac or Linux, or an Anaconda PowerShell Prompt from the Start menu on Windows (PowerShell has more commands available than a plain Command Prompt on Windows). With our command line ready, we then simply type python, et voilà! We have access to the Python shell. You can try some basic commands, such as 2 + 2 and print('hello').
This...