eBook - ePub

Learning OpenCV 4 Computer Vision with Python 3

Name: Learning OpenCV 4 Computer Vision with Python 3
Author: Joseph Howse, Joe Minichino

Get to grips with tools, techniques, and algorithms for computer vision and machine learning, 3rd Edition

Joseph Howse, Joe Minichino

Share book

372 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Learning OpenCV 4 Computer Vision with Python 3

Get to grips with tools, techniques, and algorithms for computer vision and machine learning, 3rd Edition

Joseph Howse, Joe Minichino

Book details

Book preview

Table of contents

Citations

About This Book

Updated for OpenCV 4 and Python 3, this book covers the latest on depth cameras, 3D tracking, augmented reality, and deep neural networks, helping you solve real-world computer vision problems with practical code

Key Features

Build powerful computer vision applications in concise code with OpenCV 4 and Python 3
Learn the fundamental concepts of image processing, object classification, and 2D and 3D tracking
Train, use, and understand machine learning models such as Support Vector Machines (SVMs) and neural networks

Book Description

Computer vision is a rapidly evolving science, encompassing diverse applications and techniques. This book will not only help those who are getting started with computer vision but also experts in the domain. You'll be able to put theory into practice by building apps with OpenCV 4 and Python 3.

You'll start by understanding OpenCV 4 and how to set it up with Python 3 on various platforms. Next, you'll learn how to perform basic operations such as reading, writing, manipulating, and displaying still images, videos, and camera feeds. From taking you through image processing, video analysis, and depth estimation and segmentation, to helping you gain practice by building a GUI app, this book ensures you'll have opportunities for hands-on activities. Next, you'll tackle two popular challenges: face detection and face recognition. You'll also learn about object classification and machine learning concepts, which will enable you to create and use object detectors and classifiers, and even track objects in movies or video camera feed. Later, you'll develop your skills in 3D tracking and augmented reality. Finally, you'll cover ANNs and DNNs, learning how to develop apps for recognizing handwritten digits and classifying a person's gender and age.

By the end of this book, you'll have the skills you need to execute real-world computer vision projects.

What you will learn

Install and familiarize yourself with OpenCV 4's Python 3 bindings
Understand image processing and video analysis basics
Use a depth camera to distinguish foreground and background regions
Detect and identify objects, and track their motion in videos
Train and use your own models to match images and classify objects
Detect and recognize faces, and classify their gender and age
Build an augmented reality application to track an image in 3D
Work with machine learning models, including SVMs, artificial neural networks (ANNs), and deep neural networks (DNNs)

Who this book is for

If you are interested in learning computer vision, machine learning, and OpenCV in the context of practical real-world applications, then this book is for you. This OpenCV book will also be useful for anyone getting started with computer vision as well as experts who want to stay up-to-date with OpenCV 4 and Python 3. Although no prior knowledge of image processing, computer vision or machine learning is required, familiarity with basic Python programming is a must.

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is Learning OpenCV 4 Computer Vision with Python 3 an online PDF/ePUB?

Yes, you can access Learning OpenCV 4 Computer Vision with Python 3 by Joseph Howse, Joe Minichino in PDF and/or ePUB format, as well as other popular books in Computer Science & Computer Vision & Pattern Recognition. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Packt Publishing

Year

2020

ISBN

9781789530643

Edition

Topic

Computer Science

Subtopic

Computer Vision & Pattern Recognition

Index

Computer Science

Camera Models and Augmented Reality

If you like geometry, photography, or 3D graphics, then this chapter's topics should especially appeal to you. We will learn about the relationship between 3D space and a 2D projection. We will model this relationship in terms of the basic optical parameters of a camera and lens. Finally, we will apply the same relationship to the task of drawing 3D shapes in an accurate perspective projection. Throughout all of this, we will integrate our previous knowledge of image matching and object tracking in order to track 3D motion of a real-world object whose 2D projection is captured by a camera in real time.

On a practical level, we will build an augmented reality application that uses information about a camera, an object, and motion in order to superimpose 3D graphics on top of a tracked object in real time. To achieve this, we will conquer the following technical challenges:

Modeling the parameters of a camera and lens
Modeling a 3D object using 2D and 3D keypoints
Detecting the object by matching keypoints
Finding the object's 3D pose using the cv2.solvePnPRansac function
Smoothing the 3D pose using a Kalman filter
Drawing graphics atop the object

Over the course of this chapter, you will acquire skills that will serve you well if you go on to build your own augmented reality engine or any other system that relies on 3D tracking, such as a robotic navigation system.

Technical requirements

This chapter uses Python, OpenCV, and NumPy. Please refer back to Chapter 1, Setting Up OpenCV, for installation instructions.

The completed code and sample videos for this chapter can be found in this book's GitHub repository, https://github.com/PacktPublishing/Learning-OpenCV-4-Computer-Vision-with-Python-Third-Edition, in the chapter09 folder.

This chapter's code contains excerpts from an open source demo project called Visualizing the Invisible, by Joseph Howse (one of this book's authors). To learn more about this project, please visit its repository at https://github.com/JoeHowse/VisualizingTheInvisible/.

Understanding 3D image tracking and augmented reality

We have already solved problems involving image matching in Chapter 6, Retrieving Images and Searching Using Image Descriptors. Moreover, we have solved problems involving continuous tracking in Chapter 8, Tracking Objects. Therefore, we are familiar with many of the components of an image tracking system, though we have not yet tackled any 3D tracking problems.

So, what exactly is 3D tracking? Well, it is the process of continually updating an estimate of an object's pose in a 3D space, typically, in terms of six variables: three variables to represent the object's 3D translation (that is, position) and the other three variables to represent its 3D rotation.

A more technical term for 3D tracking is 6DOF tracking – that is, tracking with 6 degrees of freedom, meaning the 6 variables we just mentioned.

There are several different ways of representing the 3D rotation as three variables. Elsewhere, you might have encountered various kinds of Euler angle representations, which describe the 3D rotation in terms of three separate 2D rotations around the x, y, and z axes in a particular order. OpenCV does not use Euler angles to represent 3D rotation; instead, it uses a representation called the Rodrigues rotation vector. Specifically, OpenCV uses the following six variables to represent the 6DOF pose:

t_x: This is the object's translation along the x axis.
t_y: This is the object's translation along the y axis.
t_z: This is the object's translation along the z axis.
r_x: This is the first element of the object's Rodrigues rotation vector.
r_y: This is the second element of the object's Rodrigues rotation vector.
r_z: This is the third element of the object's Rodrigues rotation vector.

Unfortunately, in the Rodrigues representation, there is no easy way to interpret r_x, r_y, and r_z separately from each other. Taken together, as the vector r, they encode both an axis of rotation and an angle of rotation about this axis. Specifically, the following formulas define the relationship among the r vector; an angle, θ; a normalized axis vector, r̂; and a 3 x 3 rotation matrix, R:

As OpenCV programmers, we are not obliged to compute or interpret any of these variables directly. OpenCV provides functions that give us a Rodrigues rotation vector as a return value, and we can pass this rotation vector to other OpenCV functions as an argument – without ever needing to manipulate its contents for ourselves.

For our purposes (and, indeed, for many problems in computer vision), the camera is the origin of the 3D coordinate system. Therefore, in any given frame, the camera's current t_x, t_y, t_z, r_x, r_y, and r_z values are all defined to be 0. We will endeavor to track other objects relative to the camera's current pose.

Of course, for our edification, we will want to visualize the 3D tracking results. This brings us into the territory of augmented reality (AR). Broadly speaking, AR is the process of continually tracking relationships between real-world objects and applying these relationships to virtual objects, in such a way that a user perceives the virtual objects as being anchored to something in the real world. Typically, visual AR is based on relationships in terms of 3D space and perspective projection. Indeed, our case is typical; we want to visualize a 3D tracking result by drawing a projection of some 3D graphics atop the object we tracked in the frame.

We will return to the concept of perspective projection in a few moments. Meanwhile, let's take an overview of a typical set of steps involved in 3D image tracking and visual AR:

Define the parameters of the camera and lens. We will introduce this topic in this chapter.
Initialize a Kalman filter that we will use to stabilize the 6DOF tracking results. For more information about Kalman filtering, refer back to Chapter 8, Tracking Objects.
Choose a reference image, representing the surface of the object we want to track. For our demo, the object will be a plane, such as a piece of paper on which the image is printed.
Create a list of 3D points, representing the vertices of the object. The coordinates can be in any unit, such as meters, millimeters, or something arbitrary. For example, you could arbitrarily define 1 unit to be equal to the object's height.
Extract feature descriptors from the reference image. For 3D tracking applications, ORB is a popular choice of descriptor since it can be computed in real time, even on modest hardware such as smartphones. Our demo will use ORB. For more information about ORB, refer back to Chapter 6, Retrieving Images and Searching Using Image Descriptors.
Convert the feature descriptors from pixel coordinates to 3D coordinates, using the same mapping that we used in step 4.
Start capturing frames from the camera. For each frame, perform the following steps:
1. Extract feature descriptors, and attempt to find good matches between the reference image and the frame. Our demo will use FLANN-based matching with a ratio test. For more information about these approaches for matching descriptors, refer back to Chapter 6, Retrieving Images and Searching Using Image Descriptors.
2. If an insufficient number of good matches were found, continue to the next frame. Otherwise, proceed with the remaining steps.

1. Attempt to find a good estimate of the tracked object's 6DOF pose based on the camera and lens parameters, the matches, and the 3D model of the reference object. For this, we will use the cv2.solvePnPRansac function.
2. Apply the Kalman filter to stabilize the 6DOF pose so that it does not jitter too much from frame to frame.
3. Based on the camera and lens parameters, and the 6DOF tracking results, draw a projection of some 3D grap...