Group and Crowd Behavior for Computer Vision
eBook - ePub

Group and Crowd Behavior for Computer Vision

  1. 438 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Group and Crowd Behavior for Computer Vision

About this book

Group and Crowd Behavior for Computer Vision provides a multidisciplinary perspective on how to solve the problem of group and crowd analysis and modeling, combining insights from the social sciences with technological ideas in computer vision and pattern recognition.The book answers many unresolved issues in group and crowd behavior, with Part One providing an introduction to the problems of analyzing groups and crowds that stresses that they should not be considered as completely diverse entities, but as an aggregation of people.Part Two focuses on features and representations with the aim of recognizing the presence of groups and crowds in image and video data. It discusses low level processing methods to individuate when and where a group or crowd is placed in the scene, spanning from the use of people detectors toward more ad-hoc strategies to individuate group and crowd formations.Part Three discusses methods for analyzing the behavior of groups and the crowd once they have been detected, showing how to extract semantic information, predicting/tracking the movement of a group, the formation or disaggregation of a group/crowd and the identification of different kinds of groups/crowds depending on their behavior.The final section focuses on identifying and promoting datasets for group/crowd analysis and modeling, presenting and discussing metrics for evaluating the pros and cons of the various models and methods. This book gives computer vision researcher techniques for segmentation and grouping, tracking and reasoning for solving group and crowd modeling and analysis, as well as more general problems in computer vision and machine learning.- Presents the first book to cover the topic of modeling and analysis of groups in computer vision- Discusses the topics of group and crowd modeling from a cross-disciplinary perspective, using social science anthropological theories translated into computer vision algorithms- Focuses on group and crowd analysis metrics- Discusses real industrial systems dealing with the problem of analyzing groups and crowds

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Group and Crowd Behavior for Computer Vision by Vittorio Murino,Marco Cristani,Shishir Shah,Silvio Savarese in PDF and/or ePUB format, as well as other popular books in Computer Science & Digital Media. We have over one million books available in our catalogue for you to explore.

Information

Chapter 1

The Group and Crowd Analysis Interdisciplinary Challenge

Vittorio MurinoāŽ; Marco Cristani†; Shishir Shah—; Silvio Savarese§ āŽPattern Analysis and Computer Vision (PAVIS), Istituto Italiano di Tecnologia, Genova, Italy
†University of Verona, Verona, Italy
—University of Houston, Houston, TX, USA
§Stanford University, Stanford, CA, USA

Abstract

This book highlights the study of groups of people as the primary focus of research in conjunction with crowds. Crowds are formed primarily by groups, and not only by single individuals, so the focus on groups is beneficial to understanding crowds, and vice versa. The subject matter covered in this book aims to address a highly focused problem with a strong multidisciplinary appeal to practitioners in both fundamental research and applications. This book is dedicated to solving the problem of group and crowd analysis and modeling in computer vision, pattern recognition and social sciences, and highlighting the open issues and challenges. Despite aiming to address a highly focused problem, the techniques covered in this book, e.g. techniques of segmentation and grouping, tracking and reasoning, are highly applicable to other more general problems in computer vision and machine learning.

Keywords

Groups and crowds; Human behavior; Image and video analysis

1.1 The Study of Groups and Crowds

Understanding activities and human behavior from images and videos is an active research area in computer vision and has a large impact to many real-world applications. These include surveillance, assistive robotics, autonomous driving, data analytics, to cite a few. The research community has put significant focus on analyzing the behavior of individuals and proposed methods that can understand and predict behavior of humans as they are considered in isolation. More recently, however, the attention has shifted to the new issues of analyzing and modeling gatherings of people, commonly referred as groups or crowds, depending on the number of people involved. The research done on these two topics has brought about many diverse ad-hoc methodologies and algorithms, and has led to a growing interest in this topic. This has been supported by multiple factors. Firstly, the advancement of the detection and filtering strategies running on powerful hardware has encouraged the development of algorithms able to deal with hundreds of different individuals, providing results that were unthinkable just few years ago. Concomitantly, there has been a broader availability of new types of sensors, and the possibility of mounting these sensors on cutting edge devices, from glasses to drones. Such sensory devices have made it possible to observe people from radically different points of view, in a genuine ecologic, noninvasive manner, and for long durations, namely, from ego-vision settings to bird-eye views of people. Moreover, the advancement of social signal processing [1,2] has brought in the computer vision and pattern recognition community new models imported from the social sciences, able to read between-the-lines of simple locations and velocities assumed by the individuals, using advanced notions of proxemics and kinesics [1,3]. Finally, the industry, governments, and small companies are asking our community for methods to understand and model groups and crowd, for public order and safety, social robotics, advanced profiling and many other applications.
The study of groups and crowds has generally been considered as having its roots in sociology and psychology. Human behavior, in general, has been extensively studied by sociologists to understand social interactions and crowd dynamics. It has been argued that characteristics that dictate human motion constitute a complex interplay between human physical, environmental, and psychosocial characteristics. It is a common observation that people, whenever free to move about in an environment, tend to respect certain patterns of movement. More often, these patterns of movement are dominated by social mechanisms [3]. The study of groups and crowds from the computer vision perspective has typically been modeled as a three-level approach. At the low level, given a video, humans are detected [4,5], then tracked [6,7], and then tracklets are grouped to form trajectories [8]. At the mid-level, machine learning techniques are used to identify groups by clustering trajectories [9]. At the higher level, a semantic understanding of the group behavior is obtained, like classifying actions such as ā€œwalking in groupsā€, ā€œprotestingā€, ā€œgroup vandalismā€, etc.
The low level algorithms have been widely studied in computer vision [10–12] with promising results. However, algorithms at the middle and high level have only been explored in recent times.
Algorithms at the higher level can either explicitly model human behavior and their interactions in the group and with the environment, or a model can be created through observations by assuming that the human behavior is encapsulated in the learning process. Khan and Shah [13] observed and learned a group's rigid formation structure to classify the activity and successfully applied it to parades. Ryoo and Aggarwal [14,15] represented and learned various types of complex group activities with a programming language-like representation, and then recognized the displayed activities based on the recognition of activities of individual group members. On the other hand, human behavioral models can be used to predict the human interactions with each other. Helbing and Molnar [16] proposed the social force model, which assumes humans as particles and models the influence of other humans and the environment as forces. Furthermore, Pellegrini et al. [17] and Choi et al. [18], as well as [19,20], proposed models that anticipate and avoid collisions of a human with other humans and the actual scene physical structure. These models assume that the humans partaking in the group follow the existing social norms and hence can be used to model specific categories of people, and even crowds.
Typically, in crowded scenes, people are engaged in multiple activities resulting from inter- and intra-group interactions. This poses a rather challenging problem in analyzing group events due to variations in the number of people involved, and more specifically the different human actions and social interactions exhibited within people and groups [21–24]. Understanding groups and their activities is not limited to only analyzing movements of individuals in group. The environment in which these groups exist provides important contextual information that can be invaluable in recognizing activities in crowded scenes [25,26]. Perspectives from sociology and psychology embedded into computer vision algorithms show that human activities can be effectively understood by considering implicit cognitive processes as latent variables that drive positioning, proximity to other people, movement, gesturing, etc. [16,27–30]. For example, exploring the spatial and directional relationships between people can facilitate the detection of social interactions in a group. Thus, activity analysis in low-density crowded scenes can often be considered a multistep process, one that involves individual person activity, individuals forming meaningful groups, interaction between individuals, and interactions between groups [28]. In general, the approaches to group activity analysis can be classified into two categories: bottom-up and top-down. The bottom-up (BU) approaches rely on recognizing activity of each individual in a group. Vice versa, top-down (TD) methods recognize group activity by analyzing at the group level rather than at the individual level. Since BU algorithms address the understanding of activities at the individual level, they are limited in recognizing activities at the group level. Conversely, TD approaches show better contextual understanding of activities of a group as a whole, but they are not robust enough to recognize activities at the individual level.
On the other side, when the density of people becomes too high, also in dependence of the camera perspective view, individuals and even groups cannot be distinguished anymore, and a more holistic analysis should be performed to figure out the behavior of a crowd. Analyzing crowd scenes can be categorized into three main topics, i.e., (i) crowd density estimation and people counting, (ii) tracking in crowd, and (iii) modeling crowd behaviors [31]. Recently, some works on pedestrian path predictions in crowded scenes were also proposed [32]. The goal of these methods is to predict the pedestrian pathway in advance, given the past walking history and the surrounding environment (obstacles, scene geometry, etc.). This is yet another interesting application in crowd scenarios having the aim of, for instance, estimating entry/exit points in a specific area or to find the main people walking pathways or standing areas, so that this information can be possibly used to set up open spaces.
Estimating the number of people in a crowd is a cardinal stage for several real-world applications such as safety control, monitoring public transportation, crowd simulation and rendering for animation and urban planning. Many interesting works are present in the literature addressing this target [33–35], however, automated crowd density estimation still remains an open problem in computer vision due to extreme occlusion conditions and visual ambiguities of the human appearances in such scenarios [36].
Tracking individuals (or objects) in crowd scenes is another challenging task [37,38] which involves, other than severe occlusions, cluttered background and pattern deformations, which are common complexities in visual tracking. In practice, the efficiency and effectiveness of crowd trackers is largely dependent on crowd density and dynamics, people social interactions as well as the crowd's psychological characteristics [36,39,40].
Typically, the primary goal of modeling crowd behaviors is to allow the identification of abnormal events such as, for instance, riots, panic, and violence acts [41]. Despite recent works in this direction, detecting crowd abnormalities still remains an open and challenging problem mainly because of the ā€œlooseā€ definition of abnormality which is strongly context dependent [42,43]. For example, riding a bike in a street is a normal action, whereas it may be considered abnormal in another scene with a different context such as a park or a sidewalk. Similarly, people gathering for a social event is typically a normal situation, while a similar gathering to ā€œprotest against somethingā€ can be an abnormal event, which may deserve attention and needs to be detected. Several methods have been devised to analyze crowd behavior. One of the most influential works still derived from the Social Force Model (SFM) [16], and was proposed by Mehran et al. [44]. It adopted the SFM and particle advection scheme for detecting and localizing abnormal behavior in crowd videos. To this end, it considered the entire crowd as a set of moving particles whose interaction force was computed using SFM. The interaction force mapped onto the video frames identifies the force flow of each particle and is used as the basis for extracting features which, along with a bag-of-words strategy, is used to classify each frame as either normal or a...

Table of contents

  1. Cover image
  2. Title page
  3. Table of Contents
  4. Copyright
  5. About the Editors
  6. Chapter 1: The Group and Crowd Analysis Interdisciplinary Challenge
  7. Part 1: Features and Representations
  8. Chapter 2: Social Interaction in Temporary Gatherings
  9. Chapter 3: Group Detection and Tracking Using Sociological Features
  10. Chapter 4: Exploring Multitask and Transfer Learning Algorithms for Head Pose Estimation in Dynamic Multiview Scenarios
  11. Chapter 5: The Analysis of High Density Crowds in Videos
  12. Chapter 6: Tracking Millions of Humans in Crowded Spaces
  13. Chapter 7: Subject-Centric Group Feature for Person Reidentification
  14. Part 2: Group and Crowd Behavior Modeling
  15. Chapter 8: From Groups to Leaders and Back
  16. Chapter 9: Learning to Predict Human Behavior in Crowded Scenes
  17. Chapter 10: Deep Learning for Scene-Independent Crowd Analysis
  18. Chapter 11: Physics-Inspired Models for Detecting Abnormal Behaviors in Crowded Scenes
  19. Chapter 12: Activity Forecasting
  20. Part 3: Metrics, Benchmarks and Systems
  21. Chapter 13: Integrating Computer Vision Algorithms and Ontologies for Spectator Crowd Behavior Analysis
  22. Chapter 14: SALSA: A Multimodal Dataset for the Automated Analysis of Free-Standing Social Interactions
  23. Chapter 15: Zero-Shot Crowd Behavior Recognition
  24. Chapter 16: The GRODE Metrics
  25. Chapter 17: Realtime Pedestrian Tracking and Prediction in Dense Crowds
  26. Subject Index