The Computer Vision Workshop
eBook - ePub

The Computer Vision Workshop

Develop the skills you need to use computer vision algorithms in your own artificial intelligence projects

  1. 568 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

The Computer Vision Workshop

Develop the skills you need to use computer vision algorithms in your own artificial intelligence projects

About this book

Explore the potential of deep learning techniques in computer vision applications using the Python ecosystem, and build real-time systems for detecting human behavior

Key Features

  • Understand OpenCV and select the right algorithm to solve real-world problems
  • Discover techniques for image and video processing
  • Learn how to apply face recognition in videos to automatically extract key information

Book Description

Computer Vision (CV) has become an important aspect of AI technology. From driverless cars to medical diagnostics and monitoring the health of crops to fraud detection in banking, computer vision is used across all domains to automate tasks. The Computer Vision Workshop will help you understand how computers master the art of processing digital images and videos to mimic human activities.

Starting with an introduction to the OpenCV library, you'll learn how to write your first script using basic image processing operations. You'll then get to grips with essential image and video processing techniques such as histograms, contours, and face processing. As you progress, you'll become familiar with advanced computer vision and deep learning concepts, such as object detection, tracking, and recognition, and finally shift your focus from 2D to 3D visualization. This CV course will enable you to experiment with camera calibration and explore both passive and active canonical 3D reconstruction methods.

By the end of this book, you'll have developed the practical skills necessary for building powerful applications to solve computer vision problems.

What you will learn

  • Access and manipulate pixels in OpenCV using BGR and grayscale images
  • Create histograms to better understand image content
  • Use contours for shape analysis, object detection, and recognition
  • Track objects in videos using a variety of trackers available in OpenCV
  • Discover how to apply face recognition tasks using computer vision techniques
  • Visualize 3D objects in point clouds and polygon meshes using Open3D

Who this book is for

If you are a researcher, developer, or data scientist looking to automate everyday tasks using computer vision, this workshop is for you. A basic understanding of Python and deep learning will help you to get the most out of this workshop.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere β€” even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access The Computer Vision Workshop by Hafsa Asad,Vishwesh Ravi Shrimali,Nikhil Singh in PDF and/or ePUB format, as well as other popular books in Computer Science & Computer Vision & Pattern Recognition. We have over one million books available in our catalogue for you to explore.

1. Basics of Image Processing

Overview
This chapter serves as an introduction to the amazing and colorful world of image processing. We will start by understanding images and their various components – pixels, pixel values, and channels. We will then have a look at the commonly used color spaces – red, green, and blue (RGB) and Hue, Saturation, and Value (HSV). Next, we will look at how images are loaded and represented in OpenCV and how can they be displayed using another library called Matplotlib. Finally, we will have a look at how we can access and manipulate pixels. By the end of this chapter, you will be able to implement the concepts of color space conversion.

Introduction

Welcome to the world of images. It's interesting, it's wide, and most importantly, it's colorful. The world of artificial intelligence (AI) is impacting how we, as humans, can use the power of smart computers to perform tasks much faster, more efficiently, and with minimal effort. The idea of imparting human-like intelligence to computers (known as AI) is a really interesting concept. When the intelligence is focused on images and videos, the domain is referred to as computer vision. Similarly, natural language processing (NLP) is the AI stream where we try to understand the meaning behind the text. This technology is used by major companies for building AI-based chatbots designed to interact with customers. Both computer vision and NLP share the concepts of deep learning, where we use deep neural networks to complete tasks such as object detection, image classification, word embedding, and more. Coming back to the topic of computer vision, companies have come up with interesting use cases where AI systems have managed to change entire scenarios. Google, for example, came up with the idea of Google Goggles, which can perform several operations, such as image classification, object recognition, and more. Similarly, Tesla's self-driving cars use computer vision extensively to detect pedestrians and vehicles on the road and to detect the lane on which the car is moving.
This book will serve as a journey through the interesting components of computer vision. We will start by understanding images and then go over how they can be processed. After a couple of chapters, we will jump into detailed topics such as histograms and contours and finally go over some real-life applications of computer vision – face processing, object detection, object tracking, 3D reconstruction, and so on. This is going to be a long journey, but we will get through it together.
We love looking at high-resolution color photographs, thanks to the gamut of colors they offer. Not so long ago, however, we had photos printed only in black and white. However, those "black-and-white" photos also had some color in them, the only difference being that the colors were all shades of gray. The common thing that's there in all these components is the vision part. That's where computer vision gets its name. Computer refers to the fact that it's the computer that is processing the visual data, while vision refers to the fact that we are dealing with visual data – images and videos.
An image is made up of smaller components called pixels. A video is made up of multiple frames, each of which is nothing but an image. The following diagram gives us an idea of the various components of videos, images, and pixels:
Figure 1.1: Relationships between videos, images, and pixels
Figure 1.1: Relationships between videos, images, and pixels
In this chapter, we will focus only on images and pixels. We will also go through an introduction to the OpenCV library, along with the functions present in the library that are commonly used for basic image processing. Before we jump into the details of images and pixels, let's get through the prerequisites, starting with NumPy arrays. The reason behind this is that images in OpenCV are nothing but NumPy arrays. Just as a quick recap, NumPy is a Python module that's used for numerical computations and is well known for its high-speed computations.

NumPy Arrays

Let's learn how to create a NumPy array in Python.
First, we need to import the NumPy module using the import numpy as np command. Here, np is used as an alias. This means that instead of writing numpy.function_name, we can use np.function_name.
We will have a look at four ways of creating a NumPy array:
  • Array filled with zeros – the np.zeros command
  • Array filled with ones – the np.ones command
  • Array filled with random numbers – the np.random.rand command
  • Array filled with values specified – the np.array command
Let's start with the np.zeros and np.ones commands. There are two important arguments for these functions:
  • The shape of the array. For a 2D array, this is (number of rows, number of columns).
  • The data type of the elements. By default, NumPy uses floating-point numbers for its data types. For images, we will use unsigned 8-bit integers – np.uint8. The reason behind this is that 8-bit unsigned integers have a range of 0 to 255, which is the same range that's followed for pixel values.
Let's have a look at a simple example of creating an array full of zeros. The array size should be 4x3. We can do this by using np.zeros(4,3). Similarly, if we want to create a 4x3 array full of ones, we can use np.ones(4,3).
The np.random.rand function, on the other hand, only needs the shape of the array. For a 2D array, it will be provided as np.random.rand(number_of_rows, number_of_columns).
Finally, for the np.array function, we provide the data as the first argument and the data type as the second argument.
Once you have a NumPy array, you can use npArray.shape to find out the shape of the array, where npArray is the name of the NumPy array. We can also use npArray.dtype to display the data type of the elements in the array.
Let's learn how to use these functions by completing the first exercise of this chapter.

Exercise 1.01: Creating NumPy Arrays

In this exercise, we will get some hands-on experience with the various NumPy functions that are used to create NumPy arrays and to obtain their shape. We will be using NumPy's zeros, ones, and rand functions to create the arrays. We will also have a look at their data types and shapes. Follow these steps to complete this exercise:
  1. Create a new notebook and name it Exercise1.01.ipynb. This is where we will write our code.
  2. First, import the NumPy module:
    import numpy as np
  3. Next, let's create a 2D NumPy array with 5 rows and 6 columns, filled with zeros:
    npArray = np.zeros((5,6))
  4. Let's print the array we just created:
    print(npArray)
    The output is as follows:
    [[0. 0. 0. 0. 0. 0.]
    [0. 0. 0. 0. 0. 0.]
    [0. 0. 0. 0. 0. 0.]
    [0. 0. 0. 0. 0. 0.]
    [0. 0. 0. 0. 0. 0.]]
  5. Next, let's print the data type of the elements of the array:
    print(npArray.dtype)
    The output is float64.
  6. Finally, let's print the shape of the array:
    print(npArray.shape)
    The output is ...

Table of contents

  1. The Computer Vision Workshop
  2. Preface
  3. 1. Basics of Image Processing
  4. 2. Common Operations When Working with Images
  5. 3. Working with Histograms
  6. 4. Working with contours
  7. 5. Face Processing in Image and Video
  8. 6. Object Tracking
  9. 7. Object Detection and Face Recognition
  10. 8. OpenVINO with OpenCV
  11. Appendix