eBook - ePub

Learning OpenCV 4 Computer Vision with Python 3

Name: Learning OpenCV 4 Computer Vision with Python 3
Author: Joseph Howse, Joe Minichino

Get to grips with tools, techniques, and algorithms for computer vision and machine learning, 3rd Edition

Joseph Howse, Joe Minichino

Partager le livre

372 pages
English
ePUB (adapté aux mobiles)
Disponible sur iOS et Android

eBook - ePub

Learning OpenCV 4 Computer Vision with Python 3

Get to grips with tools, techniques, and algorithms for computer vision and machine learning, 3rd Edition

Joseph Howse, Joe Minichino

Détails du livre

Aperçu du livre

Table des matières

Citations

À propos de ce livre

Updated for OpenCV 4 and Python 3, this book covers the latest on depth cameras, 3D tracking, augmented reality, and deep neural networks, helping you solve real-world computer vision problems with practical code

Key Features

Build powerful computer vision applications in concise code with OpenCV 4 and Python 3
Learn the fundamental concepts of image processing, object classification, and 2D and 3D tracking
Train, use, and understand machine learning models such as Support Vector Machines (SVMs) and neural networks

Book Description

Computer vision is a rapidly evolving science, encompassing diverse applications and techniques. This book will not only help those who are getting started with computer vision but also experts in the domain. You'll be able to put theory into practice by building apps with OpenCV 4 and Python 3.

You'll start by understanding OpenCV 4 and how to set it up with Python 3 on various platforms. Next, you'll learn how to perform basic operations such as reading, writing, manipulating, and displaying still images, videos, and camera feeds. From taking you through image processing, video analysis, and depth estimation and segmentation, to helping you gain practice by building a GUI app, this book ensures you'll have opportunities for hands-on activities. Next, you'll tackle two popular challenges: face detection and face recognition. You'll also learn about object classification and machine learning concepts, which will enable you to create and use object detectors and classifiers, and even track objects in movies or video camera feed. Later, you'll develop your skills in 3D tracking and augmented reality. Finally, you'll cover ANNs and DNNs, learning how to develop apps for recognizing handwritten digits and classifying a person's gender and age.

By the end of this book, you'll have the skills you need to execute real-world computer vision projects.

What you will learn

Install and familiarize yourself with OpenCV 4's Python 3 bindings
Understand image processing and video analysis basics
Use a depth camera to distinguish foreground and background regions
Detect and identify objects, and track their motion in videos
Train and use your own models to match images and classify objects
Detect and recognize faces, and classify their gender and age
Build an augmented reality application to track an image in 3D
Work with machine learning models, including SVMs, artificial neural networks (ANNs), and deep neural networks (DNNs)

Who this book is for

If you are interested in learning computer vision, machine learning, and OpenCV in the context of practical real-world applications, then this book is for you. This OpenCV book will also be useful for anyone getting started with computer vision as well as experts who want to stay up-to-date with OpenCV 4 and Python 3. Although no prior knowledge of image processing, computer vision or machine learning is required, familiarity with basic Python programming is a must.

Foire aux questions

Comment puis-je résilier mon abonnement ?

Il vous suffit de vous rendre dans la section compte dans paramètres et de cliquer sur « Résilier l’abonnement ». C’est aussi simple que cela ! Une fois que vous aurez résilié votre abonnement, il restera actif pour le reste de la période pour laquelle vous avez payé. Découvrez-en plus ici.

Puis-je / comment puis-je télécharger des livres ?

Pour le moment, tous nos livres en format ePub adaptés aux mobiles peuvent être téléchargés via l’application. La plupart de nos PDF sont également disponibles en téléchargement et les autres seront téléchargeables très prochainement. Découvrez-en plus ici.

Quelle est la différence entre les formules tarifaires ?

Les deux abonnements vous donnent un accès complet à la bibliothèque et à toutes les fonctionnalités de Perlego. Les seules différences sont les tarifs ainsi que la période d’abonnement : avec l’abonnement annuel, vous économiserez environ 30 % par rapport à 12 mois d’abonnement mensuel.

Qu’est-ce que Perlego ?

Nous sommes un service d’abonnement à des ouvrages universitaires en ligne, où vous pouvez accéder à toute une bibliothèque pour un prix inférieur à celui d’un seul livre par mois. Avec plus d’un million de livres sur plus de 1 000 sujets, nous avons ce qu’il vous faut ! Découvrez-en plus ici.

Prenez-vous en charge la synthèse vocale ?

Recherchez le symbole Écouter sur votre prochain livre pour voir si vous pouvez l’écouter. L’outil Écouter lit le texte à haute voix pour vous, en surlignant le passage qui est en cours de lecture. Vous pouvez le mettre sur pause, l’accélérer ou le ralentir. Découvrez-en plus ici.

Est-ce que Learning OpenCV 4 Computer Vision with Python 3 est un PDF/ePUB en ligne ?

Oui, vous pouvez accéder à Learning OpenCV 4 Computer Vision with Python 3 par Joseph Howse, Joe Minichino en format PDF et/ou ePUB ainsi qu’à d’autres livres populaires dans Computer Science et Computer Vision & Pattern Recognition. Nous disposons de plus d’un million d’ouvrages à découvrir dans notre catalogue.

Informations

Éditeur

Packt Publishing

Année

2020

ISBN

9781789530643

Édition

Sujet

Computer Science

Sous-sujet

Computer Vision & Pattern Recognition

Camera Models and Augmented Reality

If you like geometry, photography, or 3D graphics, then this chapter's topics should especially appeal to you. We will learn about the relationship between 3D space and a 2D projection. We will model this relationship in terms of the basic optical parameters of a camera and lens. Finally, we will apply the same relationship to the task of drawing 3D shapes in an accurate perspective projection. Throughout all of this, we will integrate our previous knowledge of image matching and object tracking in order to track 3D motion of a real-world object whose 2D projection is captured by a camera in real time.

On a practical level, we will build an augmented reality application that uses information about a camera, an object, and motion in order to superimpose 3D graphics on top of a tracked object in real time. To achieve this, we will conquer the following technical challenges:

Modeling the parameters of a camera and lens
Modeling a 3D object using 2D and 3D keypoints
Detecting the object by matching keypoints
Finding the object's 3D pose using the cv2.solvePnPRansac function
Smoothing the 3D pose using a Kalman filter
Drawing graphics atop the object

Over the course of this chapter, you will acquire skills that will serve you well if you go on to build your own augmented reality engine or any other system that relies on 3D tracking, such as a robotic navigation system.

Technical requirements

This chapter uses Python, OpenCV, and NumPy. Please refer back to Chapter 1, Setting Up OpenCV, for installation instructions.

The completed code and sample videos for this chapter can be found in this book's GitHub repository, https://github.com/PacktPublishing/Learning-OpenCV-4-Computer-Vision-with-Python-Third-Edition, in the chapter09 folder.

This chapter's code contains excerpts from an open source demo project called Visualizing the Invisible, by Joseph Howse (one of this book's authors). To learn more about this project, please visit its repository at https://github.com/JoeHowse/VisualizingTheInvisible/.

Understanding 3D image tracking and augmented reality

We have already solved problems involving image matching in Chapter 6, Retrieving Images and Searching Using Image Descriptors. Moreover, we have solved problems involving continuous tracking in Chapter 8, Tracking Objects. Therefore, we are familiar with many of the components of an image tracking system, though we have not yet tackled any 3D tracking problems.

So, what exactly is 3D tracking? Well, it is the process of continually updating an estimate of an object's pose in a 3D space, typically, in terms of six variables: three variables to represent the object's 3D translation (that is, position) and the other three variables to represent its 3D rotation.

A more technical term for 3D tracking is 6DOF tracking – that is, tracking with 6 degrees of freedom, meaning the 6 variables we just mentioned.

There are several different ways of representing the 3D rotation as three variables. Elsewhere, you might have encountered various kinds of Euler angle representations, which describe the 3D rotation in terms of three separate 2D rotations around the x, y, and z axes in a particular order. OpenCV does not use Euler angles to represent 3D rotation; instead, it uses a representation called the Rodrigues rotation vector. Specifically, OpenCV uses the following six variables to represent the 6DOF pose:

t_x: This is the object's translation along the x axis.
t_y: This is the object's translation along the y axis.
t_z: This is the object's translation along the z axis.
r_x: This is the first element of the object's Rodrigues rotation vector.
r_y: This is the second element of the object's Rodrigues rotation vector.
r_z: This is the third element of the object's Rodrigues rotation vector.

Unfortunately, in the Rodrigues representation, there is no easy way to interpret r_x, r_y, and r_z separately from each other. Taken together, as the vector r, they encode both an axis of rotation and an angle of rotation about this axis. Specifically, the following formulas define the relationship among the r vector; an angle, θ; a normalized axis vector, r̂; and a 3 x 3 rotation matrix, R:

As OpenCV programmers, we are not obliged to compute or interpret any of these variables directly. OpenCV provides functions that give us a Rodrigues rotation vector as a return value, and we can pass this rotation vector to other OpenCV functions as an argument – without ever needing to manipulate its contents for ourselves.

For our purposes (and, indeed, for many problems in computer vision), the camera is the origin of the 3D coordinate system. Therefore, in any given frame, the camera's current t_x, t_y, t_z, r_x, r_y, and r_z values are all defined to be 0. We will endeavor to track other objects relative to the camera's current pose.

Of course, for our edification, we will want to visualize the 3D tracking results. This brings us into the territory of augmented reality (AR). Broadly speaking, AR is the process of continually tracking relationships between real-world objects and applying these relationships to virtual objects, in such a way that a user perceives the virtual objects as being anchored to something in the real world. Typically, visual AR is based on relationships in terms of 3D space and perspective projection. Indeed, our case is typical; we want to visualize a 3D tracking result by drawing a projection of some 3D graphics atop the object we tracked in the frame.

We will return to the concept of perspective projection in a few moments. Meanwhile, let's take an overview of a typical set of steps involved in 3D image tracking and visual AR:

Define the parameters of the camera and lens. We will introduce this topic in this chapter.
Initialize a Kalman filter that we will use to stabilize the 6DOF tracking results. For more information about Kalman filtering, refer back to Chapter 8, Tracking Objects.
Choose a reference image, representing the surface of the object we want to track. For our demo, the object will be a plane, such as a piece of paper on which the image is printed.
Create a list of 3D points, representing the vertices of the object. The coordinates can be in any unit, such as meters, millimeters, or something arbitrary. For example, you could arbitrarily define 1 unit to be equal to the object's height.
Extract feature descriptors from the reference image. For 3D tracking applications, ORB is a popular choice of descriptor since it can be computed in real time, even on modest hardware such as smartphones. Our demo will use ORB. For more information about ORB, refer back to Chapter 6, Retrieving Images and Searching Using Image Descriptors.
Convert the feature descriptors from pixel coordinates to 3D coordinates, using the same mapping that we used in step 4.
Start capturing frames from the camera. For each frame, perform the following steps:
1. Extract feature descriptors, and attempt to find good matches between the reference image and the frame. Our demo will use FLANN-based matching with a ratio test. For more information about these approaches for matching descriptors, refer back to Chapter 6, Retrieving Images and Searching Using Image Descriptors.
2. If an insufficient number of good matches were found, continue to the next frame. Otherwise, proceed with the remaining steps.

1. Attempt to find a good estimate of the tracked object's 6DOF pose based on the camera and lens parameters, the matches, and the 3D model of the reference object. For this, we will use the cv2.solvePnPRansac function.
2. Apply the Kalman filter to stabilize the 6DOF pose so that it does not jitter too much from frame to frame.
3. Based on the camera and lens parameters, and the 6DOF tracking results, draw a projection of some 3D grap...