1
Introduction
Look in, let not either the proper quality, or the true worth of anything pass thee, before thou hast fully apprehended it.
—MARCUS AURELIUS Meditations, 170–180 AD (Translated by Meric Casaubon, 1634)
This book presents selected object detection and recognition methods in computer vision, joining theory, implementation as well as applications. The majority of the selected methods were used in real automotive vision systems. However, two groups of methods were distinguished. The first group contains methods which are based on tensors, which in the last decade have opened new frontiers in image processing and pattern analysis. The second group of methods builds on mathematical statistics. In many cases, object detection and recognition methods draw from these two groups. As indicated in the title, equally important is the explanation of the main concepts of the methods and presentation of their mathematical derivations, as their implementations and usage in real applications. Although object detection and recognition are strictly connected, to some extent both domains can be seen as pattern classification and frequently detection precedes recognition, we make a distinction between the two. Object detection in our definition mostly concerns answering a question about whether a given type of object is present in images. Sometimes, their current appearance and position are also important. On the other hand, the goal of object recognition is to tell its particular type. For instance, we can detect a face, or after that identify a concrete person. Similarly, in the road sign recognition system for some signs, their detection unanimously reveals their category, such as “Yield.” However, for the majority of them, we first detect their characteristic shapes, then we identify their particular type, such as “40km/h speed limit, ” and so forth.
Detection and recognition of objects in the observed scenes is a natural biological ability. People and animals perform this effortlessly in daily life to move without collisions, to find food, avoid threats, and so on. However, similar computer methods and algorithms for scene analysis are not so straightforward, despite their unprecedented development. Nevertheless, biological systems after close observations and analysis provide some hints for their machine realizations. A good example here are artificial neural networks which in their diversity resemble biological systems of neurons and which – in their software realization – are frequently used by computers to recognize objects. This is how the branch of computer science, called computer vision (CV), developed. Its main objective is to make computers see as humans, or even better. Sometimes it becomes possible.
Due to technological breakthroughs, domains of object detection and recognition have changed so dynamically that preparation of even a multivolume publication on the majority of important subjects in this area seems impossible. Each month hundreds of new papers are published with new ideas, theorems, algorithms, etc. On the other hand, the fastest and most ample source of information is Internet. One can easily look up almost all subjects on a myriad of webpages, such as Wikipedia. So, nowadays the purpose of writing a book on computer vision has to be stated somewhat differently than even few years ago. The difference between an ample set of information versus knowledge and experience starts to become especially important when we face a new technological problem and our task is to solve it or design a system which will do this for us. In this case we need a way of thinking, which helps us to understand the state of nature, as well as a methodology which takes us closer to a potential solution. This book grew up in just this way, alongside my work on different projects related to object recognition in images. To be able to apply a given method we need first to understand it. At this stage not just a final formula summarizing a method, but also its detailed mathematical background, are of great use. On the other hand, bare formulas don't yet solve the problem. We need their implementations. This is the second stage, sometimes requiring more time and work than the former. One of the main goals of this book is to join the two domains on a selected set of useful methods of object detection and recognition. In this respect I hope this book will be of practical use, both for self study and also as a reference when working on a concrete problem. Nevertheless, we are not able to go through all stages of all the methods, but I hope the book will provide at least a solid start for further study and development in this fascinating and dynamically changing area.
As indicated in the title, one of my goals was to join theory and practice. My experience is that such composition leads to an in-depth understanding of the subject. This is further underpinned by case studies of mostly automotive applications of object detection and recognition. Thus, sections of this book can be grouped as follows:
- Presentations of methods, their main concepts, and mathematical background.
- Method implementations which contain C++ code listings (sections of this type are indicated with word IMPLEMENTATION).
- Analysis of special applications (their names start with CASE STUDY).
Apart from this we have some special entries which contain brief explanations of some mathematical concepts with examples which aim is to help in understanding the mathematical derivation in the surrounding sections.
A comment on code examples. I have always been convinced that in a book like this we should not spoil pages with an introduction to C, C++ or other basic principles of computer science, as sometimes is the case. The reasons are at least twofold: the first is that for computer science there are a lot of good books available, for which I provide the references. The second reason, is so to not divert a Reader from the main purpose of this book, which is an in-depth presentation of the modern computer vision methods and algorithms. On the other hand, Readers who are not familiar with C++ can skip detailed code explanations and focus on implementation in other platforms. However, there is no better way of learning the method than through practical testing and usage in applications.
This book is based on my experience gathered while working on many scientific projects. Results of these were published in a number of conference and journal articles. In this respect, two previous books are special. The first, An Introduction to 3D Computer Vision Techniques and Applications, written together with J. Paul Siebert, was published by Wiley in 2009 [1]. The second is my habilitation thesis [2], also issued in 2009 by the AGH University of Science and Technology Press in Kraków, Poland. Extended parts of the latter are contained in different sections of this book, permission for which was granted by the AGH University Press.
Most of all, I have always found being involved in scientific and industry projects real fun and an adventure leading to self-development. I wish the same to you.
1.1 A Sample of Computer Vision
In this section let us briefly take a look at some applications of computer vision in the systems of driver monitoring, as well as scene analysis. Both belong to the on-car Driver Assisting System aimed at facilitating driving, for example by notifying drivers of incoming road signs, and most of all by preventing car accidents, for example due to the driver falling asleep.
Figure 1.1 depicts a system of cameras mounted in a test car. The cameras can observe the driver and allow the system to monitor his or her state. Cameras can also observe the front of the car for pedestrian detection or road sign recognition, in which case they can send an image like the one presented in Figure 1.2.
What type of information can we draw from such an image? This depends on our goal, certainly. In the real traffic situation depicted we are mainly interested in driving the car safely, avoiding pedestrians and other vehicles in motion or parked, as well as spotting and reacting to traffic signals and signs. However, in a situation where someone sent us this image we might be interested in finding out the name of that street, for instance. What can computer vision do for us? To some extent all of the above, and soon driving a car, at least in special conditions. Let us look at some stages of processing by computer vision methods, details of which are discussed in the next chapters.
Let us first observe that even a single color image has three dimensions, as shown in Figure 1.3(a). In the case of multiple images or a video stream, dimensions grow. Thus, we need tools to analyze such structures. As we will see, tensors offer new possibilities in this respect. Also, their recently developed decompositions allow insight into information contained in such multidimensional structures, as well as their compression or extraction of features for further classification. Much research into computer vision and pattern recognition is on feature detection and their properties. In this respect such transformations are investigated which change the original intensity or color pixels into some new representation which provides some knowledge about image contents or is more appropriate for finding specific objects. An example of an application of the structural tensor to image in Figure 1.2 for detection of areas with strong local structures is shown in Figure 1.3(b). Found structures are encoded with color – their orientation is represented by different colors, whereas strength is by color saturation. Let us observe that areas with no prominent structures show no response of this filter – in Figure 1.3(b) they are simply black. As will be shown, such representation proves very useful in finding specific figures in images, such as pedestrians, cars, or road signs, and so forth.
Let us now briefly show the possible steps that lead to detection of road signs in the image in Figure 1.2. In this method signs are first detected with fast segmentation by specific colors characteristic to different groups of expected signs. For instance, red color segmentation is used to spot all-red objects, among which could also be the red rims of the prohibitive signs, and so on for all colors of interest.
Figure 1.4 shows binary maps obtained of the image in Figure 1.2 after red and blue segmentations, respectively. There are many segmentation methods which are discussed in this book. In this case we used manually gathered color samples which were used to train the support vector classifiers.
From the maps in Figure 1.4 we need to find a way of selecting objects whose shape and size potentially correspond to the road signs we are looking for. This is done by specific methods which rely on detection of salient points, as well as on fuzzy logic rules which define the potential shape and size of the candidate objects.
Figure 1.5 shows the detected areas of the signs. These now need to be fed to the next classifier which will provide a final response, first if we are really observing...